The Journal of Educational 
" Psychology 


RavwoNp G. KUHLEN, EDITOR 
Syracuse University 


Advisory Editors 


LEE J. CRONBACH T. ERNEST NEWLAND 
University of Illinois University of Illinois 

Francis J. DIVESTA " H. H. REMMERS 
Syracuse University Purdue University 


Cuester W. HARRIS 
University of Wisconsin 
Ernest R. HILGARD 


Juran C. STANLEY 
University of Wisconsin 


Stanford University J. B. Stroup E 
NicHoras Hosss State University of Iowa 
George Peabody College Dowarb E. Sopin 


Sor Teachers Teachers College, 
Hanorp E. Jones Columbia University 
University of California 


Invic Lorcr PERCIVAL M. SYMONDS 


Teachers College, 


Teachers College. B 53 
{ Columbia University Columbia University 
.. Dororura CARTHY GEORGE G. THOMPSON 
Fordham University Ohio State University 


* 


A - * 


> . 
VOLUME 50, 1959 


ARTHUR C. HOFFMAN, Managing Editor 
HELEN Orr, Promotion Manager 


PUBLISHED BIMONTHLY BY 
THE AMERICAN PSYCHOLOGICAL ASSOCIATION, INC. 
1333 SIXTEENTH STREET N.W. 
WASHINGTON 6, D. C. 
Copyright, 1959, by the American Psychological Association, Inc. 
T cde 


= 


< 


CONTENTS OF VOLUME 50 


k Amel, J. S., AND Grocx, M. D. An Evaluation of the Effectiveness of a Freshman 


. 4l 
Mathematite Goulds. ou nes revue rcc con to ae 4 


AnNHorr, F. N. Adul 
Stimulation Genera 


Bartierr, C. J. Dimen 
Beypia, A. W., anv H 


t Age Differences in Performance on a Visual-Spatial Task of sig 
uo MERE MEME 


sions of Leadership Behavior in Classroom Discussion Groups.... 280 
OUNTRAS, P. T. Anxiety, Authoritarianism, and Student Attitude 


toward Departmental Control of (College: Trstrüeton. v. sasear eei DECR 1 
Birney, R. C., AND Taytor, M. J. Scholastic Behavior and Orientation to College....... 266 


Carrier, N. A, A Note on the Effect of Filling Out an “Anxiety Scale” on Examination 


BS san ag OI cate AMAN MEM M . 293 
Caunisrar, R, E. See KnuwnBorzz, J. D. 
Coat, J. M. See Sumero, M. S. 
Corrie, R. See Suetpon, M. S. 

OSGROVE, D, J, Diagnostic Rating of Teacher Performance.......................... 200 
Coster, J. K, Some Characteristics of High School Pupils from Three Income Groups.... 55>» 
Cost, F. The Effect of an Introductory Psychology Course on Self-Insight........ ve 83 
Cnoxnacu, L. J., AND GLESER, GorpiwE, C. Interpretation of Reliability and Validity p 

Coefficients: Remarks on a Paper iy Donne wee tee eee eee 


Darzy, M. F. See Noman, R. D. 

Davis, F. B. Interpretation of Differences amiong Averages and Individual Test Scores... 162 
Eisner, S., AND Ronne, K. Note Taking during or after the Lecture: Hi. nene —— 2 
Kuausmetrr, H. J. T 


FELDHUSEN, J. F. See 
FnENcn, J. W. The Re 


75 
15 
g Visual Stimuli and Its Relation to Reading. ........ 8 
Gieser, Gorprxz C. See Cnoxnacn, L. J. 
Grocx, M. D. See Aumann, J. S. 
Goran, G. M. See Mipprzros, G., Jn. 
Hanns, D. B. A Note on Some Ability Correlates of the Raven Progressive Matrices 
(1947) in the Der SHREYA scion ve ne vn e resad ccna MR TM 227 


Hasterup, G. M. - Transfer from Context by Subthreshold Summation 


Horta, J. L, The Prediction of College Grades from the California Psychological In- 


ventory and Scholastic Aptitude Test 


Hottanp, J. L. Some I, 
Hounrras, P. T. See B 
JACKSON, P, W., 


imitations of Teacher Ratings as Predictors of Creativity 
ENDIG, A. W. 


» AND Gerzets, J. W. Psychological Health and Classroom Functioning: 
A Study of Dissatisfaction with School among Adolescents 


Johnson, R, T. Sec Stone, D. R. 


Kumnm, E, R, The 
Machine |... 


Krrawo, H. L. Refusals 


Development of Understanding in Arithmetic by a Teaching im 


and Illegibilities in the Spelling Errors of Maladjusted Children... 129 


iii 


iv CONTENTS OF VOLUME 50 
KrauswErn, H. J., Axp Ferouusen, J. F. Retention in Arithmetic among Children of 
Low, Average, and High Intelligence at 117 Months. of Age... 26 n n] 88 
&xigr, Lorus M., anp Stroup, J. B. Intercorrelations among Various Intelligence, 
Achievement, and Social Class Seores....... c.m seen si eerie eens: 117 
Koenic, Karuryn, anp McKeacuis, W. J. Personality and Independent Study.......- 132 
KowarRAKUL, Suranc. Some Behaviors of Elementary School Children Related to Class- 
room Activities and Subject Areas........ ccce t 121 


Knuwporrz, J. D., Curisrar, R. E., anp War, J. H., Jn. Predicting Leadership Ratings 
from High School Activities. . .. 0... 0..cccceei cee nem eene 


Lesser, G. S. The Relationships between Various Forms of Aggression and Popularity 
among Lower-Class Children.......... cies) ntn 


Levinson, B. M. Traditional Jewish Cultural Values and Performance on the Wechsler 


MocKzacnre, W. J. See Kornic, KATHRYN. 

Marais, C. The Relationship between Salary Policies and Teacher Morales. ncscesienr 275 
Meapow, A. See Panxzs, S. J. 

Menuer, D. M., Ap Mitzet, H. E. Some Behavioral Correlates of Teacher Effectiveness., 239 
Mrrwiy, J. C. Rational and Mathematical Relationships of Six Scoring Procedures Appli- 


105 


177 


cable to Three-Choice Item8....... sese I Ime nnne 153 
g „MIDDLETON, G., Jn., anD Gururre, G. M. Personality Syndromes and Academic Achieve- 

N E eror cosas E E EEN EE E T T Pia BE She ss sind 66 
J Mrrengu., J. V., Jn. Goal-Setting Behavior as a Function of Self-Acceptance, Over- 

and Underachievement, and Related Personality Variables.....................05. 93 


MırzeL, H. E. See Menter, D. M. 


Neer, Ann F. The Relationship of Authoritarian Personality to Learning: F Scale 
Scores Compared to Classroom Performance. ...........2.0.:eseeeeseeeeeeeueeenenes 195 


Norman, R. D. Ax» Dater, M. F. The Comparative Personality Adjustment of Superior 
and Inferior Readers 


Parnes, S. J. Effects of “Brainstorming” Instructions on Creative Problem Solving by 


Trained and Untrained Subjecta.. ———m 171 
Puck, R. F. Predicting Principals’ Ratings of Teacher Performance from Personality 
DB Le ius sese hase aser isana ab acra crise Sha sc ncc RETA MTR RUE eR STAC 70 
Tüvuw, Leanne G. Creativity and the Self-Attitudes and Sociability of High School 
POSER a munna nae a oa sr at ws ais any eb aa EE et a mates a TEE DETR ates nae 147 
Ronne, K. See E1syer, S. 
Sassenrarn, J. M. Learning without Awareness and Transfer of Learning Sets........ 205 


Scuonr, L. See Srnoup, J. B. 


/ScHoonover, Saran M. The Relationship of Intelligence and Achievement to Birth 
Order, Sex of Sibling, and Age Interval 


Scuwanrz, M. See Weee, W. B. 


143 


el M. S., Coarz, J. M., ann Corrie, R. Concurrent Validity of the “Warm Teacher 
cales’ 


Sroxe, D. R., AND Jounson, R. T. A Study of Words Indicating Frequency. . 
Srrovp, J. B., Axp Scorr, L. Individual Differences in Memory 
Srnoup, J. B. See Kner, Lorus M. 


SWINEFORD, Frances. Some Relations between Test Scores and Item Statistics 
'TAvron, M. J. See Birney, R. C. 


CONTENTS OF VOLUME 50 v 


THISTLETHWATTE, D. L. College Press and Student Achievement 


Tuistteruwarre, D. L, Effects of Social Recognition upon the Educational Motivation 
WU onc. DC NC HON A qi 111 


Wess, W. B., anD Scuwartz, M. Measurement Characteristics of Recall in Relation to 
the Presentation of Increasingly Large Amounts of Material 


Wuson, C. W. Value Difference between Public and Private School Graduates.......... 213 
Wise, L. M. Abnormal Psychology as a Selective Factor: A Confirmation and Extension. 192 
Woreun, L. Level of Aspiration and Academic Success......... lon 47 


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 


Volume 50 


February 1959 


Number 1 


m 


ANXIETY, AUTHORITARIANISM, AND STUDENT ATTITUDE 
TOWARD DEPARTMENTAL CONTROL OF COLLEGE 
INSTRUCTION! 

A. W. BENDIG AND PETER T. HOUNTRAS 
University of Pittsburgh 


The amount of control exercised by a 
department over the classroom policies 
and procedures utilized by college teachers 
obviously varies considerably from course 
to course, from department to department, 
and even among different colleges and uni- 
versities, The determination of course me- 
chanics, such as textbook selection, read- 
ing assignments, testing and grading 
procedures, and multifarious details that 
contribute to formal course structure may 
run the gamut from a laissez-faire situa- 
tion where each teacher is king, selecting 
the procedures he believes are most appro- 
priate for each of his sections, to a policy 
of departmentally imposed uniformity in 
classroom procedures. 

Little attention has been paid to the 
attitudinal response of students to varia- 
tions in the amount of departmental con- 
trol over the structure of their courses. 
McKeachie (1951) and others have argued 
that the grade-oriented anxiety of college 
students is more easily controlled and di- 
Tected toward course achievement in 
courses that are highly structured so that 
the student always knows what is expected 
of him and how the course will be con- 
ducted by the instructor. Previous re- 
search (Lazarus, Deese, & Osler, 1952; 
Teevan & McKeachie, 1954) has empha- 


*The authors wish to express their appreci- 
SCR to Vincent A. Tamburo, undergradu- 
ate honors major in psychology, for his 


valuable assistance in part of the statistical 
computations, 


sized the view that the anxiety level of 
individual Ss is negatively related to suc- 
cessful achievement in complex perform- 
ance situations. This hypothesis of Mc- 
Keachie suggests that there should be a 
positive relationship between measures of 
student anxiety level and favorable atti- 
tudes toward the departmental structur- 
ing of classroom procedures. 

The work of Adorno Frenkel-Brunswik, 
Levinson, and Sanford (1950) also sug- 
gests that the personality trait of authori- 
tarianism may be related to student pref- 
erences for departmental control. The 
highly authoritarian individual dislikes 
and becomes highly anxious in an un- 
structured and ambiguous social situation 
and expresses strong preferences for for- 
mal structure. The relationship between 
test measures of authoritarianism and 
anxiety has received some attention and 
research has produced contradictory evi- 
dence. Correlations between the F scale 
devised by Adorno et al. (1950) as a meas- 
ure of authoritarianism and Taylor’s Man- 
ifest Anxiety Scale (1953) were reported 
as positive and high by Jones (1953) and 
Davids (1955) and as essentially zero by 
Davids and Eriksen (1957) and Masling 
(1954). The suggestion by Edwards (1957) 
that Taylor’s MAS and similar scales are 
heavily saturated with “social desirabil- 
ity” factors in Ss’ responding to the items 
further complicates the picture. 

The present study was designed to fur- 


2 A. W. BENDIG AND PETER T. HOUNTRAS 


ther explore the interrelationships among 
measures of anxiety, authoritarianism, and 
attitude toward departmental control 
among undergraduate and graduate stud- 
ents in education. 


PROCEDURE 


Scales 


Three separate psychometric scales, 
yielding four scores, were assembled into 
a booklet to be given to each S with the 
S's responses to the items recorded on à 
single IBM answer sheet. A description 
of the scales is given below. 

Department control. Twenty state- 
ments, purporting to measure Ss’ atti- 
tudes toward the amount of control that 
a college department should exercise over 
the classroom procedures and policies of 
the instructor, were written by the au- 
thors and revised for clarity and unam- 
biguity. Typical statements were the fol- 
lowing: 

The individual instructor should decide 
when to take attendance in his class rather 
than the department requiring him to take 
it once every week. 

Departmental policy should determine the 
kind of tests (essay, true-false, multiple- 
choice) to be given in a course and not the 
individual instructor. 


The Ss were requested to respond to each 
statement on a three-point scale (mildly 
agree, neutral, mildly disagree) as to their 
feelings toward each of the classroom poli- 
cies as they related to all college courses 
that the Ss had taken or might take in the 
future, and were instructed not to limit 
their expressed opinions specifically to the 
course in which they were currently en- 
rolled. The responses were quantified for 
scoring by giving two points to the extreme 
statement (agree or disagree) that, on an 
a priori basis, represented an opinion fa- 
vorable toward departmental control over 
the instructor, while neutral responses 
were given a weight of one point. Thus a 
high score on this scale represents an atti- 


tude favorable toward departmental con- 
trol, while a low score presumably reflects 
a favorable attitude toward more inde- 
pendence for the classroom instructor. The 
items were approximately counterbalanced 
for direction of scoring to minimize re- 
sponse bias on the part of the Ss with 
nine statements being scored in the “agree” 
direction and 11 statements scored in the 
“disagree” direction. The 20 items were 
randomized for presentation to the Ss. 
For convenience, we will refer to this scale 
as the Instructional Control Attitude Scale 
(ICAS). 

Anxiety. The anxiety scores were de- 
rived from the administration of Cattell’s 
IPAT Anxiety Scale (1957). The 40 tri- 
chotomously-scored items comprising this 
test are divided into 20 “cryptic” or “sub- 
tle” items that have been shown through 
factor analytic studies to be highly loaded 
with the “anxiety” factor and 20 “overt 
symptomatic” or less disguised items that 
are also heavily loaded with the same fac- 
tor. Scores on the first set of items give 
a measure of “covert anxiety,” while scores 
on the second set yield an “overt anxiety” 
measure. Although both scores measure 
the same “anxiety” factor, they may have 
differential validity since the “cryptic” 
items are presumably less influenced by 
the Ss’ tendencies toward giving socially 
desirable responses to personality items, 
a factor that has been emphasized by Ed- 
wards (1957). High scores on both sub- 
scales reflect high degrees of “covert” and 
“overt” anxiety. 

Authoritarianism. 28 items were drawn 
from Forms 45 and 40 used by Adorno 
et al., (1950, pp. 255-257) in their studies 
of the authoritarian personality. Instead 
of using the cumbersome six-point item 
response method used by the authors of 
the statements, a simpler trichotomous 
response method was used with “agree” 
responses to each item being given a 
weight of two points, “undecided” re- 
sponses a weight of one, and “disagree” 


STUDENT ATTITUDE TOWARD DEPAR TMENTAL CONTROL 3 


responses a weight of zero. High scores on 
this F scale presumably measure strong 
tendencies toward authoritarian attitudes. 

The three groups of test items (depart- 
mental control anxiety, and authoritar- 
lanism) were assembled into a single test 
booklet with individual sections labelled 
"Classroom Policies Questionnaire", *Self 
Analysis Scale", and “Public Opinion In- 
ventory" being preceded by instructions 
to the Ss on how to record their item re- 


Sponses on the separate IBM answer 
sheets, 


Subjects 


The scales were administered to a to- 
tal of 219 students (104 men and 115 
women) enrolled in eight undergraduate 
and graduate classes on educational psy- 
chology. The 109 undergraduate Ss (31 
men and 78 women) were almost exclu- 
sively sophomore pre-education students 
enrolled in four sections of introductory 
educational psychology while the 110 grad- 
uate Ss (73 men and 37 Women) were 
graduate students in the School of Educa- 
tion working toward advanced degrees and 
enrolled in four sections of courses in hu- 
man learning and educational research. 
All eight sections were taught by three 
male faeulty members having a Ph.D. de- 
gree in psychology and with several years 
of college teaching experience. The scales 
Were administered during one class period 
midway in the Semester with the Ss re- 
quested to indicate only their age, sex, and 
academic level (undergraduate or gradu- 


ate) on the answer sheets, thus preserving 
student anonymity. 


Resutrs 


. To estimate the internal consistency re- 
liability of the 20-item Instructional Con- 
trol Attitude Scale (ICAS), a random 
sample of 150 Ss was selected from the 
total group of 219 Ss with the only re- 
striction on the randomness of selection 
being that equal numbers of men and 


women Ss were included in the sample. 
Of the 150 Ss, 79 were graduate students 
and 71 were undergraduates. The reli- 
ability of the ICAS was computed using 
the variation of Kuder-Richardson For- 
mula 20 suggested by Ferguson (1951) 
for use with trichotomously-scored items 
and the resulting reliability coefficient for 
150 Ss was .66. 

Each of the four scale scores obtained 
were separately subjected to an analysis 
of variance applied to a 2 x 2 factorial 
design with student sex and academic level 
(undergraduate vs. graduate) as the inde- 
pendent variables. Because of dispropor- 
tionate subclass frequencies, the method 
of unweighted means (Snedecor, 1956, pp. 
385-380) was used in these analyses. The 
results are presented in Table 1. The grad- 
uate student Ss had a significantly (.01 
level) lower mean score (10.0) on the 
ICAS than did the undergraduate Ss 
(13.1), while there was not a significant 
sex difference on the ICAS. No significant 
sex or academic level differences were ap- 
parent for either of the two anxiety scores 
(Covert Anxiety and Overt Anxiety). The 
male Ss were significantly (.05 level) more 
authoritarian (mean = 21.9) than were 
the female Ss (mean = 18.8), and there 
Was a tendency (significant only at the 
-10 level) for the graduates to be less au- 
thoritarian (mean = 19.1) than the un- 
dergraduates (mean = 21.5) on the F 
scale. None of the sex by academic level 
interaction mean squares approached sig- 
nificance in any of the four analyses. 

Product-moment correlations among the 
four scales were computed within each of 
the sex groups of the undergraduate-grad- 
uate dichotomy. The four interscale cor- 
relations (one from each of the sex-level 
subgroups of Ss) from each pair of scales 
were tested for homogeneity by the chi- 
square test described by Edwards (1950, 
p. 135), and, when this test gave no evi- 
dence to reject the hypotheses of homoge- 
neous correlations, the coefficients were 


A. W. BENDIG AND PETER T. HOUNTRAS 


TABLE 1 
ANALYSES OF VARIANCE oF MEAN DIFFERENCES BETWEEN UNDERGRADUATE AND 
GRADUATE STUDENTS ON ATTITUDES TOWARD INSTRUCTIONAL 
CONTROL, ANXIETY, AND AUTHORITARIANISM 


Instr. Control | Covert Anx. | Overt Anx. Mahiti- 
s anism 
Source of Variation df 
MS F MS F MS F MS F 
Sex 1 .62 .02 .93 | .01 | 87.50 | 2.11 |452.34 | 5.21> 
Academic level 1 |432.24 | 13.01*| 7.46 | .32 | 36.93 | .89 |276.41 | 3.18* 
S X AL 1 .62 .02 | 2.26 | .10 | 6.73 | .16 | 15.96 | .18 
Within subgroups 215 | 33.23 23.42 41.38 86.80 
a P <.10 
b P <.05 
ep «.01 
TABLE 2 


CORRELATIONS OF THE INSTRUCTIONAL CONTROL ÅTTITUDE ScALE (ICAS) wrrH ANXIETY 
AND AUTHORITARIANISM SCALES 


Undergraduates Graduates $ 
Scales Homogeneity Average 
Male | Female | Male | Female | Square) | Correlation 
(N = 31) | (N = 78) | (N = 73) | (N = 37) 
ICAS vs. Covert Anxiety —.26 23> —.15 .08 1.93* 
ICAS vs. Overt Anxiety —.05 17 —.10 .07 1.09 .03 
ICAS vs. Authoritarianism 27 26^ EU .09 2.96 .29* 
Covert Anx. vs. Overt Anx. .59* 66° .63* .53* .98 .62* 
Covert! Anx. vs. Authori- .08 06 .06 .06 .01 .06 
tarianism 
Overt Anx. vs. Authoritari-| .91^ .20* .00 .28^ 3.12 .16^ 
anism 
a P «0 = 
b P <.05 
eP «.01 


averaged by the usual r-to-z method (Ed- 
wards, 1950, pp. 133-134). The results of 
these computations can be found in Table 
2. 

Covert Anxiety and Overt Anxiety 
showed a consistently high average inter- 
correlation (r — .62) as might be expected 
since both subseales were designed to 
measure the same “anxiety” factor. Co- 
vert Anxiety is not related to Authoritar- 
janism (r = .06) while the average corre- 
lation between Overt Anxiety and the F 
scale (r = .16) is significant at the 05 


level with three of the four intrasubgroup 
coefficients being significant at the .10 
level. 

Scores on the ICAS were significantly 
(.01 level) related to Authoritarianism for 
the total group (r = .29) with Ss high on 
the F seale preferring more departmental 
control over course procedures. Although 
ICAS scores were unrelated to Overt Anx- 
iety (r 003), the pattern of subgroup 
correlations between the ICAS and Co- 
vert Anxiety scales is somewhat puzzling. 
The chi-square test of the homogeneity of 


STUDENT ATTITUDE TOWARD DEPARTMENTAL CONTROL 5 


the subgroup correlations between ICAS 
and Covert Anxiety was significant at the 
05 level (chi-square = 7.93 with three de- 
grees of freedom), precluding the pooling 
of the subgroups to compute the average 
correlation between these two scales as 
was done with other interscale correla- 
tions. Critical ratio tests of the signifi- 
cance of the differences between pairs of 
individual subgroup correlations using the 
r-to-z technique described by Edwards 
(1950, pp. 131-132), indicated that the 
difference between the interscale correla- 
tions for the undergraduate male and fe- 
male Ss (rs = —.26 and 23) was sig- 
nificant at the .05 level (z = 2.25), while 
the difference for the graduate sex sub- 
groups (7s = —.15 and .08), although in 
the same direction, was not significant 
(2 = 1.11). Neither the difference in cor- 
relations between the male undergraduate 
and graduate subgroups (r's = —.26 and 
—.15), nor the difference between the two 
female subgroups (r's = .23 and .08) ap- 
proached statistical significance (z's = 
49 and .75). Pooling of the undergradu- 
ate and graduate subgroups within each 
Sex group indicated that both average 
interscale correlations (r's = —.18 and .19) 
were different, from zero, but only at the 
10 level of confidence, and that the dif- 
ference between these two average correla- 
tions (Edwards, 1950, p. 136) was sig- 
nificant at the .01 level (z — 2.64). Ap- 
parently the lack of homogeneity of the 
four subgroup correlations between the 
ICAS and Covert Anxiety scales, found in 
the original chi-square test, reflects a sex 
difference, and not an academic level dif- 
ference, that is quite apparent in the 
undergraduate subgroups and also ap- 


- to a lesser extent, in the graduate 
s. 


Discussion 


poe results of the analyses of variance 
the four test Scores directly contradict 
© common conception of the graduate 


student in education as being much more 
anxious and authoritarian than the under- 
graduate student. The graduate Ss scored 
significantly (.01 level) lower on the ICAS 
and there was a tendency, significant only 
at the .10 level, for them to show a lower 
mean score on the F scale. In addition, no 
significant differences between the two 
academic groups of Ss were found on 
either of Cattell’s anxiety scales and both 
of the F ratios testing the significance of 
the differences between graduate and un- 
dergraduate Ss were less than unity. 
Whether it is the selection process in de- 
termining which Ss continue on into grad- 
uate education classes, the age difference 
between the two groups of Ss, or whether 
it is the actual experience of teaching that 
results in lessened authoritarian attitudes 
in our sample of graduate education stu- 
dents cannot be determined from the data 
collected for this study. 

The hypothesized positive correlation 
between ICAS and Authoritarianism was 
confirmed although the relationship was 
not large (r = .29). The only moderate 
reliability of the ICAS (r = .66) prob- 
ably reduced the apparent extent of the 
correlation between these two variables. 
However, this result substantiates the con- 
cept that authoritarian verbal responses to 
general social attitude questions generalize 
to expressed preferences for departmental 
control over college classroom procedures. 
The authoritarian would be happier in a 
well-structured course where the struc- 
ture is imposed by the department. 

The relationship between attitude to- 
ward instructional control and objective 
measures of anxiety is not as simple as 
we had originally assumed. The correlation 
between the ICAS and Overt Anxiety 
was essentially zero (r = .03). The ICAS 
appeared to be weakly related to Covert 
Anxiety, but the relationship was reversed 
between the two sex groups with the cor- 
relation being negative for men (r = .18) 
and positive for women (r = .19). The 


6 A. W. BENDIG AND PETER T. HOUNTRAS 


discussion by Edwards (1957) of “social 
desirability” factors in responding to “sub- 
tle" and “obvious” personality test items 
suggests that the Overt Anxiety scores 
were contaminated by the “social desir- 
ability" variables and this may have ob- 
secured the relationship of anxiety, as 
measured by the Overt Anxiety scale, to 
instructional control attitudes as measured 
by the ICAS. The Covert Anxiety scale, 
presumably being less contaminated by 
“social desirability,” may better reflect 
the relationship of anxiety and instruc- 
tional control attitudes. However, the ap- 
parent sex difference in the direction of 
the correlation between the ICAS and Co- 
vert Anxiety, if not a result of sampling 
error, offers a fertile ground for specula- 
tion. Covertly anxious men students pre- 
fer their instructor to have complete con- 
trol over course procedures while covertly 
anxious women students prefer the de- 
partment to structure the course. This 
may be due to most college courses having 
male instructors and male students feel- 
ing greater confidence and empathy with 
their like-sex instructors, whereas female 
students tend to distrust the control of 
their predominantly male instructors over 
course procedures and prefer the institu- 
tional restrictions of the neuter depart- 
ment over course structure. Another 
possibility is that the men Ss, being sig- 
nificantly more authoritarian than the 
women Ss, can more easily empathize with 
and trust the authoritarian figure of the 
course instructor, regardless of the sex 
of the instructor, with the more covertly 
anxious men Ss relying more on the more 
immediate authority figure of the instruc- 
tor rather than the more nebulous author- 
ity control of the department. The cov- 
ertly anxious women Ss, somewhat lower 
in authoritarianism, may distrust the au- 
thoritarian image of the instructor and 
prefer the socially more distant institu- 
tional control of the department. Essen- 
tially this hypothesis suggests that au- 


thoritarianism level influences the direction 
of the relationship between covert anxiety 
and preferences for course control with 
the correlation between these two vari- 
ables being negative among authoritarians 
and positive among nonauthoritarians, re- 
gardless of the sex of the Ss. Obviously 
our data do not permit a test of the 
comparative adequacy of alternative ex- 
planations of the significant sex differ- 
ences in obtained correlations, one based 
upon a sex difference and the other upon 
an authoritarianism difference in student 
perceptions of the role of the college 
teacher. 


SUMMARY 


A 20-item scale of student attitude to- 
ward departmental control of college 
teachers (the Instructional Control Atti- 
tude Scale) was constructed and found to 
be moderately reliable (r = .66). This 
scale, along with questionnaires measur- 
ing covert anxiety, overt anxiety, and au- 
thoritarianism were administered to 219 
undergraduate and graduate students in 
education. The graduate education Ss pre- 
ferred less departmental control of their 
college teachers and were less authoritar- 
ian than were the undergraduate Ss. No 
differences were found on either anxiety 
scale. Favorable attitudes toward depart- 
mental control were (a) positively corre- 
lated with general authoritarianism (r — 
29) for all Ss, (b) negatively correlated 
with covert anxiety for male Ss (r — 
—.18), (c) positively correlated with cov- 
ert anxiety for female Ss (r — .19), and 
(d) not correlated with the measure of 
overt anxiety (r = .03). 


REFERENCES 


Avorno, T. W. Frenxet-Brunswik, ELSE; 
Levinson, D. J., & Saxronp, R. N. The 
authoritarian personality. New York: 
Harper, 1950. 

Carrett, R. B. Handbook for the IPAT 
Anxiety Scale. Champaign, Ill.: Inst. for 
Pers. and Ability Testing, 1957. 


STUDENT ATTITUDE TOWARD DEPARTMENTAL CONTROL 7 


Davins, A. The influence of ego-involvement 
on relations between authoritarianism 
and intolerance of ambiguity. J. ab- 
norm. soc. Psychol., 1955, 51, 415-420. 

Davms, A, & Enrxsen, C. W. Some social 
and cultural factors determining rela- 
tions between authoritarianism and 
measures of neuroticism. J. consult. Psy- 
chol., 1957, 21, 155-159. 

Epwanps, A. L. Experimental design in psy- 
chological research. New York: Rine- 
hart, 1950. 

Epwanps, A. L. The social desirability vari- 
able in personality assessment and re- 
search. New York: Dryden, 1957. 

Ferauson, G. A. A note on the Kuder-Rich- 
ardson formula. Educ. psychol. measmt., 
1951, 11, 612-615. 

Jones, M. B. Aspects of the autonomous per- 
sonality: I. Manifest anxiety. U.S. Na- 
val Sch. Aviat. Med. Res. Rep., 1953, 
Proj. No. NM 001 058.25.03. 


Lazarus, R. L, Deese, J., & Oster, Soxta F. 
The effects of psychological stress upon 
performance. Psychol. Bull, 1952, 49, 
293-317. 

McKeacum, W. J. Anxiety in the college 
classroom. J. educ. Res., 1951, 45, 153- 
160. 

Masurxa, J. M. How neurotic is the authori- 
tarian? J. abnorm. soc. Psychol., 1954, 
49, 316-318. 

Snevecor, G. W. Statistical methods (5th 
ed.) Ames, Iowa: Iowa State Coll. Press, 
1956. 

TAYLOR, JANET A. A personality scale of man- 
ifest anxiety. J. abnorm. soc. Psychol., 
1953, 48, 285-290. 

Teevay, R. & McKeacnw, W. J. Effects on 
performance of different instructions in 
multiple-choice examinations. Mich. 
Acad. Sci, Arts, and Letters, 1954, 39, 
467-475. 


Received October 8, 1958. 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 55, No. 1, 1959 


SPEED OF PROCESSING VISUAL STIMULI AND 
ITS RELATION TO READING 


LUTHER C. GILBERT 


University of California, Berkeley 


Tachistoscopic records have demon- 
strated repeatedly that both good and 
poor readers at the college level can iden- 
tify the visual stimuli of phrases of sense 
material after exposure for a period which 
is only a fraction of the time required for 
the fixation pause in reading easy prose. A 
possible explanation of the phenomenon 
appears to lie in the use of the after image 
and the memory after image in the tach- 
istoscopie tests. It appears possible that 
the use of these images tends to give a 
false measure of the speed and span of 
visual perception. It also appears possible 
that maximum use of such images may 
compensate for certain processing defi- 
ciencies. 

It has occurred to the writer that the 
experimental interruption of the use of 
the retinal images is possible and that this 
technique can be employed to give a bet- 
ter understanding of the discrepancy be- 
tween the speed and span of visual per- 
ception as measured by the tachistoscopic 
tests and the speed of reading easy prose. 


PunPOSE OF THE ÍNVESTIGATION 


The purpose of this study was to in- 
vestigate the influence of varying the proc- 
essing times for the first stimulus before 
the Ss were permitted to encounter an in- 
terfering stimulus. The data. were ana- 
lyzed (a) to determine something of the 
nature of individual differences in speed 
of processing visual stimuli and (b) to de- 
termine relationship between the accuracy 
of perception and reading ability. 

Studies relating to the broad topic of 
speed and span of visual perception are 
too numerous to review here in detail. 
Many of these studies are summarized 


8 


critically in the reports of Arnoult and 
Tinker (1939), Huey (1912), Vernon 
(1931), and Woodworth and Schlosberg 
(1954). 

These earlier studies presented evidence 
to show that adult Ss could see four or five 
short familiar words in 1/10 or even 1/100 
of a second, provided the words formed 
a simple phrase or sentence and the Ss 
were not required to make a saccadic 
movement or encounter interfering visual 
stimuli immediately before or after the 
flashed material. 

In attempting to measure speed of vis- 
ual perception some investigators had the 
Ss press a key as quickly as the words 
were recognized. The most commonly used 
method was to expose the words for a 
short period and the Ss were requested to 
report the words they recognized after 
the stimuli were removed from the sereen. 
This method seems to be more a measure 
of speed of vision than speed of percep- 
tion, since the procedure permits the Ss 
to continue the processing of the ma- 
terial after the flashed words are re- 
moved from the screen. The method fails 
to identify and evaluate the processing 
time, and consequently yields inexact 
measures of speed of visual perception. 

In a preliminary study (Gilbert, 1957) 
using 68 Ss, it was found that when two 
words were flashed on the screen for 1/24 
of a second and followed immediately by 
nonsense letters’ on the same place on the 
screen, only 3.68% of the words were cor- 
rectly reported. Only one S reported cor- 
rectly as many as 20% of the words. Since 
the writer was interested in measuring the 


1 Letters typed in a random order and in 
a manner to avoid the formation of sense 
material. 


SPEED OF PROCESSING VISUAL STIMULI IN READING 9 


Speed with which the readers could avoid 
the influence of interfering stimuli on per- 
ception of sense material, these nonsense 
letters seemed to constitute a good test of 
interference. 


Subjects 


The Ss included 64 college juniors, sen- 
lors, and graduates, who were members of 
classes in educational psychology. Of the 
64, 41 were women and 23 men. For all of 
them, the native language was English. 

The rate of reading on a standard level, 
nonfiction article of approximately 2,300 
words ranged from 143 words per minute 
to 464 words per minute. The average for 
the group was 262 with a standard devia- 
tion of 74.14, 


Materials 


Reading tests. For measuring the speed 
and comprehension of reading, a 2,307- 
word test adapted from one of the Uni- 
versity of California weekly broadcasts 
Was used. This test and a second broad- 
cast test,” similar in character to the first, 
were given to 110 Ss. The correlation be- 
tween the two tests was 85 = 02, which 
indicates a reliability sufficiently high to 
justify using it as a group test. 

Span of perception tests. The six tests 
used to measure span of perception were 
all similar in every way. All the words 
were short and thoroughly familiar. The 
phrases gradually increased from one word 
to five words and from one letter to 25 
letter spaces, Each test was composed of 
five Spans, increasing from a one-word 
Span to a five-word span. There were five 
test items for each span. The material was 
all typed in lower case on the same IBM 
executive typewriter using Warren High 


* Broadcast number 2573, U. E. Number 
o 5 University of California Radio Service, 
Ctober 29, 1944. 
5 Broadcast Number 1254, U. E. Number 
T University of California Radio Service, 
ebruary 24, 1952. 


Gloss paper and black carbon ribbon. The 
material was then photographed with a 
Cine-Kodak Special II camera at a dis- 
tance of nine inches to give a large clear 
type on the sereen. A Bell and Howell pro- 
jector set at 24 frames per second was 
used for flashing the material on the 
sereen. Plus-X Reversal Safety film was 
used in the negative form in order to get 
black words on soft eggshell-color back- 
ground. A blank page of the Warren High 
Gloss paper was photographed using 
enough film to permit the projeetor to run 
for three seconds before the words ap- 
peared and two seconds after they had 
disappeared. This allowed ample time for 
starting and stopping the projector and at 
the same time for controlling the illumina- 
tion on the screen. 

The material was projected on a Da-lite 
sereen in a room with the lights turned 
down to where the Ss were just able to 
see to write their answers, but not dark 
enough to produce full dark adaptation. 
Each S was asked if he could see clearly 
the material as it appeared on the screen. 
The Ss who reported that the words 
blurred were eliminated from the study. 
Only six Ss were lost for this reason. A 
further check on vision was made by flash- 
ing two words on the sereen for 2/24 of a 
Second without an interfering stimulus 
following. The 64 Ss reported correctly 
98.43 per cent of these words. It seems 
reasonably certain that defective vision 
was not an important factor in the find- 
ings. 

The nonsense letters used as interfering 
stimuli were typed in lower case on the 
same typewriter used for typing the words. 
The letters were typed in a random order 
and in a manner to avoid the formation 
of sense material. In each case a sufficient 
number of nonsense letters was used to 
extend at least two letter spaces beyond 
the beginnings and ends of the words. 

The three examples of test items which 
follow show (a) the sense material and 


10 LUTHER C. GILBERT 


(b) the nonsense letters used as inter- 
fering stimuli: 


Item 1 
(a) words 
(b) lupytkoie 
Item 2 
(a) his best shoes 
(b) bnvermxztghjfedksl 
Item 8 
(a) they will come home early 


(b) zaqwsxcderfvbgtyhjshuiklopzxc 


The first test in this group was used to 
measure the span of perception when the 
Ss were allowed unlimited time for use of 
the images after the words left the screen. 
This span is designated the “basic” span. 
This is the kind of tachistoscopie span 
commonly reported in the literature and 
the kind commonly used when training 
for speed of visual perception. The per- 
centage of words correctly reported on 
each test as a whole is designated “accu- 
racy” of perception. The other five tests 
used to measure span and accuracy of 
perception were all designed in a manner 
to control the time allowed for processing 
the words being flashed on the screen. 
This was accomplished by increasing the 
time the words were left on the screen 
before superimposing nonsense letters on 
the same spot on the screen where the 
words had been presented. For example, 
the designation 2-0-2 for the first test 
means that the words were photographed 
on two successive frames and were fol- 
lowed without a blank frame by nonsense 
letters on two successive frames. In other 
words, the words were left on the screen 
for 2/24 of a second and were followed 
immediately by nonsense letters. In each 
case the nonsense letters were left on the 
sereen for 2/24 of a second. Each control 
test increased by 1/24 of a second the 
length of time the material was left on 
the screen. Thus in the fifth test in the 
series the words were left on the screen 


for 6/24 of a second before nonsense let- 
ters were superimposed. The exposure time 
for the nonsense letters was held constant 
at 2/24 of a second in each test. 
Preliminary instructions requested each 
S to fixate at the point on the screen where 
three small dots appeared and to try to 
see the words which appeared on the 
sereen. The center dot was placed at ex- 
actly the mid-point for the sense material. 
A number of examples familiarized the Ss 
with the procedure. In each case the dots 
were left on the screen for exactly the same 
length of time, ie. 8/24 of a second and 
were followed by two blank frames, i.e. 
2/24 of a second. This made it possible 
for the Ss to anticipate the words. After 
each phrase was flashed, time was allowed 
for the S to write the words. The Ss were 
tested in small groups of not more than 
25 and no S was more than 25 feet from 
the screen and not more than three chairs 
from the center of the screen. Further evi- 
dence that the words on the screen could 
be seen was the fact that the group as a 
whole reported accurately more than 98% 
of the words up to and including the two- 
word level on the uncontrolled span. 
The same procedure (used for the basic 
span of perception) was followed with the 
tests for the controlled spans, with the ex- 
ception that the instant the sense material 
left the screen there appeared at the same 
spot on the screen nonsense letters. The 
Ss were instructed to ignore as far as 
possible the nonsense letters and to re- 
member the phrases of sense material. 


Comparison of the Basic and Control 
Spans of Perception 


Table 1 presents data showing the per- 
centage of the words correctly reported on 
the basic span and on the various con- 
trolled spans for each word level and the 
test as a whole. It is interesting to note 
that when the Ss used a very narrow span 
of one and two words nearly all the Ss 
were able to arrive at a very high level 


SPEED OF PROCESSING VISUAL STIMULI IN READING 


of efficiency even when allowed only 3/24 
0f a second before the interfering stimulus 
Was superimposed. Also of interest is the 
fact that, as the span of perception the Ss 
Were required to use became greater, the 
longer the sense material had to be left 
on the screen before superimposing the 
nonsense letters in order for the Ss to 
achieve the level of the basic span. The 
data presented in Table 1 bring into sharp 
focus the fact that in visual perception 
such as used in reading sense material, 
the fixation pause must be long enough in 
duration to allow time not only to see but 
also time to process the visual stimuli. It 
may well be that the individual differ- 
ences in Speed of processing visual mate- 
Tials is an influential factor in both the 
Span of perception and the length of the 
fixation pauses used in reading easy prose. 
If the data in Table 1 were presented 
graphically they would show the group 
Curves of the controlled spans crossing the 
lines for the basic spans at various time 
levels. The group curve for the controlled 
test of accuracy of perception crosses the 
line for the basic test between the levels 
of 5/24 and 6/24 of a second. Tt is inter- 
esting to note that this point of crossing 
1s close to the average length of the fixa- 


11 


tion pauses commonly found for mature 
readers reading simple prose. 

The data in Table 1 show a marked in- 
crease in the percentage of words reported 
from the 2/24 to 6/24 of a second, with 
the greatest gain taking place between the 
2/24 and the 3/24 time units. At the level 
of 4/24 of a second the group had reached 
a level of efficiency on the control span 
which was slightly above the basic spans 
of the three words or less. By the level of 
5/24 of a second the control span of four 
words had equaled or exceeded the four- 
word basic span. By the level of 6/24 of 
a second the imposed stimuli had no meas- 
ured detrimental influence on the groups's 
five word span of perception. However, 
some of the very slow readers were still 
experiencing some interference. 

These data also indicate that the non- 
interference times of 4/24 or 5/24 of a 
second following the flashing of the words 
on the screen constitute a very important 
factor in the span of perception. Further- 
more they indicate that the time words 
are left on the screen is not the only fac- 
tor which should concern us in studying 
speed and span of perception. Note that 
words left on the screen for 2/24 of a 
second and not followed by other visual 


TABLE 1 
Mean Percentage or Worps CORRECTLY REPORTED ON THE VARIOUS SPANS 
(64 Ss) 
Time Units—Percentage Correct in Each Unit 

Length of Span in Words 2-00 
2-0-2 | 3-0-2 | 402 | soz | 602 | (Basic 
Span) 

[9 
Two Worn 81.95 | 97.81 | 98.75 | 100.00 | 99.68 | 99.37 
Three We s 86.71 | 97.81 | 99.06 99.53 | 99.68 98.43 
Pa ords 64.44 | 93.74 | 96.24 98.22 | 96.56 95.94 
a Words 56.14 | 83.82 | 85.54 93.04 | 92.65 91.63 
S Words 36.06 | 64.06 | 66.50 71.93 | 81.82 75.56 
Percentage of total 58.38 | 82.01 | 84.01 | 88.36 | 91.14 | 988.56 
a 17.76 | 12.94 | 10.54 9.50 9.64 10.82 


12 LUTHER C. 


stimuli were as accurately reported as 
words left on the screen for 5/24 of a sec- 
ond and followed by interfering visual 
stimuli. Both the length of time the words 
are left on the screen and the length of the 
period free from interfering stimuli are 
important factors. These data suggest the 
possibility that some readers may use part 
of their fixation time to avoid interference 
from a new stimulus during the period 
they need free for processing the visual 
stimulus. In other words, part of the fixa- 
tion time may be preventive in nature. 


Comparison of Rate of Reading and Speed 
and Accuracy of Perception 


For the purpose of showing the differ- 
ences in rate of processing the sense mate- 
rial for readers of different levels of ability 
the group of 64 Ss was ranked from the 
best to the poorest on the gross reading 
rate and then divided into fourths. The 
best 25% are designated as S, and the 
poorest 25% are designated as S,. (S. and 
S; constitute the middle 50%.) Table 2 
shows the difference between the slow 
reading group (S,) and the fast readers 
(Sı) in speed of processing sense materials. 
When the sense material was left on the 


GILBERT 


sereen for 3/24 of a second, then followed 
by an interfering stimulus, the S, group 
got a considerably higher percentage of the 
material than the S, group did when the 
sense material was left on the screen for 
6/24 of a second, then followed by an in- 
terfering stimulus. Another point of inter- 
est in these data is the fact that the S, 
subjects experienced greater difficulty in 
avoiding interference from the interrupt- 
ing stimuli than was true for the S, stu- 
dents. 


Correlation Between Speed and Accuracy 
of Perception and Rate of Reading 


This positive association between rate 
of processing simple phrase material and 
the gross reading rate of simple prose ma- 
terial is brought out a little more clearly 
in Table 3 which shows the coefficients of 
correlation for the percentage of words 
processed on each test of perception and 
the gross reading rate. The magnitude of 
these correlations with the gross reading 
rate decreases in an irregular pattern 
from the 2-0-2 time unit to the 6-0-2 
time unit level. It should be remembered 
that the average accuracy for the 6-0-2 
time unit test is similar in magnitude to 


TABLE 2 


Mean PzRCENTAGE OF Worps ConnEcrLy REPORTED IN THE VARIOUS TIME 
Units Bv S; AND S, Groups® 


(64 Ss) 
Time Units Basic Span 
Subjects 

2-0-2 | 3-02 | 402 5-0-2 6-0-2 2-0-0 
S, Best 25% of the Readers | 78.49 | 93.07 | 92.41 96.08 97.17 94.56 
SD 14.14 4.66 5.07 3.14 3.02 5.53 
Ss Poorest 25% 39.74 71.74 77.60 81.20 86.91 83.41 
SD 19.15 | 13.28 | 11.81 8.12 6.96 8.50 
Mean Difference 38.75 21.34 14.75 14.84 10.26 11.15 
"m 6.4 5.9 4.5 5.8 5.2 4.3 


® Ranked on gross reading rate. 


* All the t ratios are significant at better than the .01 level. 


SPEED OF PROCESSING VISUAL STIMULI IN READING 18 


that on the basic span test and at this 
time level there seemed to be very little 
influence resulting from the nonsense let- 
ters as reflected in the group average. 
(However, a few of the poor readers were 
still experiencing some interference.) 
Therefore it is not surprising that this 
correlation between scores on the 6-0-2 
test and gross reading rate is of a similar 
magnitude as that found between the basic 
Span test and gross reading rate. 

Table 3 also reveals that the magnitude 
of the correlations between the speed and 
accuracy of processing the phrases and 
the effective reading rate‘ are slightly 
larger but similar in magnitude to those 
Teported for the gross reading rate. The 
effective reading rate seems to reflect a 
little better than the gross reading rate 
the speed with Which the Ss can process 
the stimuli which they receive from each 
fixation pause. If this assumption is true, 
1t would seem logical that the correlations 
between effective reading rate and speed 
and accuracy of processing the phrases 
would be a little higher than for the gross 
rate and speed and accuracy of processing 
the stimuli. However, since the correlation 
between the gross and effective rates for 
this group of Ss was .86 + 012, no great 
differences in the magnitude of the corre- 
lations between gross rate and processing 
of phrases and effective rate and process- 
ing of phrases should be anticipated. 


Eye Movements 


The evidence presented in the literature 
on visual perception in reading establishes 
quite. clearly the tendency for slow read- 
ers, in reading Simple prose, to use a 
smaller span of visual perception and a 
longer fixation Pause than do faster read- 
There are, however, many exceptions 

O this tendency. "These exceptions are 


4 H 
by ee effective reading rate is computed 
RA p tiplying the gross number of words 
on th y the bercentage of correct answers 
€ test of comprehension. 


TABLE 3 
RELATION or PERCENTAGE OF WORDS 
CORRECTLY PERCEIVED To Gross 
Reaping RATE AND EFFECTIVE 
Reapine RATE 


(64 Ss) 
; " Basic 
Time Units Span 
2-0-2 | 3-0-2 | 4-0-2 | 5-0-2 | 6-0-2 | 2-0-0 


Gross Reading Rate 


r 63 | .58 | .48 | .56 | .22 | .32 


SE| .076 | .083 | .097 | .087 | .019 | .080 
Effective Reading Rate 

r | .68 | .63 | .57 | .58 | .30 | .23 

SE} .067 | .075 | .085 | .084 | .114 | .120 


doubtless, in many instances, due to the 
interaction of certain of the major factors 
involved. For example, the evidence on 
speed and accuracy of processing visual 
stimuli of simple phrases of sense mate- 
rial shows that short phrases can be proc- 
essed faster and with fewer errors than 
longer phrases. Consequently certain slow 
readers may, by keeping their units of 
visual perception very small, be able to 
read simple prose with shorter than aver- 
age fixation pauses. The record of Subject 
R is a good illustration of this type of 
Teading pattern. She read simple prose at 
the rate of 186 words per minute, with 
good comprehension. Her eye movement 
records show that she used 80 fixations 
per 100 words with average fixation pauses 
of 6.4 thirtieths of a second. Her fixation 
pauses were much shorter than the aver- 
age for slow readers. On the speed of per- 
ception tests she demonstrated a high de- 
gree of speed and accuracy of processing 
short phrases of sense material, but was 
very slow and inaccurate in processing 
long phrases. The record of Subject P re- 
veals a very different pattern. This S read 
simple prose at 176 words per minute. Her 
eye movement records show that she used 


14 LUTHER C. 
53 fixations per 100 words, but with an 
average fixation pause of 9.4 thirtieths of 
a second. Her wide span of perception and 
unusually long fixation pauses reveal a 
very different pattern from that of R. The 
speed of perception tests reveal that P 
needs a considerably longer interval of 
time free from interfering stimuli than 
the average college student in order to 
make use of her larger than average span 
of visual perception. 

The fixation pauses in reading serve 
three purposes. First, the eyes are much 
more efficient in transmitting the visual 
stimuli to the cortex when at rest than 
when in motion. Therefore, the eyes are 
stopped along the line of print in order 
to achieve maximum functional efficiency. 
Second, in order to achieve maximum 
efficiency in processing the visual stimuli, 
the retina or cortex needs an interval of 
time free from interfering visual stimuli. 
The length of this uninterrupted period is 
determined by the length of the fixation 
pause. Individual differences in the length 
of time needed for retinal or central proc- 
essing of the visual stimuli no doubt ac- 
count in part for individual differences in 
the length of the fixation pauses in read- 
ing simple prose. The first two purposes 
are prerequisites to the third, namely, 
providing time needed to comprehend the 
ideas and relationships involved. The sub- 
stantial correlation .50 + .14 between the 
speed and accuracy of visual perception 
as measured by Test 2-0-2 and the length 
of the fixation pauses in reading simple 
prose seems to support this theory. 


Summary of Findings 

The findings of the investigation may 
be summarized as follows: 

1. Among college students the span of 


visual perception is unequally influenced 
by an increased restriction in the period 


GILBERT 


of freedom from interfering visual stimuli 
following the presentation of simple phrase 
material. i 

2. For the average college student, if 
the phrases are left on the screen for 1/5 
or 1/4 of a second before the extraneous 
visual material is presented, the extrane- 
ous visual material has little influence on 
the span of visual perception. 

3. In exceptional cases, Ss needed more 
than 1/4 of a second to be able to avoid 
the interference of the extraneous mate- 
rial. 

4. The narrower the span of perception 
the easier it is to avoid the influence of 
the extraneous visual material. 

5. Interfering stimuli have a greater in- 
fluence on the span of visual perception 
for the slow readers than they do for the 
fast readers. 

6. The greater the degree of freedom 
from interfering stimuli the lower the cor- 
relation between span of visual perception 
and rate of reading. 

7. There is a substantial correlation be- 
tween the length of the fixation pauses Ss 
use in reading simple prose material and 
the speed with which the Ss can process 
tachistoscopically-presented stimuli result- 
ing from simple phrases. 


REFERENCES 


AnNourr, D. C., & Tryxer, M. A. The fixa- 
tional pause of the eyes. J. exp. Psychol., 
1939, 25, 271-280. 

Girsznr, L. C. Influence of interfering stim- 
uli on perception of meaningful mate- 
rial. Calif. J. educ. Res., 1959, 10, 15- 

23. 

Huey, E. B. The psychology and pedagogy 
of reading. Macmillan, 1912. 

Vernon, M. D. The experimental study of 
reading. Cambridge Univer, Press, 1931- 

WoopwonrH, R. S, & Scurosperc, H. Et- 


perimental psychology. Henry Holts 
1954. 


Received May 80, 1958. 


- IImu—— — -—— sv" °° °° 


— c —— A 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 1, 1959 


SACCADIC MOVEMENTS AS A FACTOR IN 
VISUAL PERCEPTION IN READING 


LUTHER C. GILBERT 
University of California, Berkeley 


It is the purpose of this investigation to 
study the saccadic movements of the eyes 
in their relationship to visual perception. 
Other data in the literature of the subject 
suggest the theory that effectiveness in 
reading is determined largely by the func- 
tional efficiency of the central nervous sys- 
tem, but there remains the question, “Does 
the level of functional efficiency of the eyes 
condition the activity of the central nerv- 
ous system ?" 

More specifically it is the purpose of the 
present study to explore the relationship 
between the speed and accuracy of per- 
ception among college Ss in reading simple 
prose both with and without saccadic 
movements of the eyes. There is the possi- 
bility that the functional efficiency of the 
eyes not only reflects the activity of the 
central nervous system but in a way condi- 
tions it as well. It is the purpose, therefore 
to put this theory to the test. í 


OTHER SruDIEs 


In 1930, Vernon published the report 
of a study in which she attempted to gauge 
the efficiency of ocular muscles in the exe- 
cution of steady fixations and lateral 
Movements and to relate this efficiency to 
the reading process. Her Ss were nine 
adults, eight of them university graduates. 
She concluded from her data that individ- 
ual types of motor processes in reading are 
originally based upon underlying motor 
ability and upon the variation of the indi- 
vidual movements which make up the 
habit according to the ability of the eye 
muscles. She further stated that the use of 
frequent short pauses, or of less frequent 
long pauses, and the tendency to overrun 

e word and then regress to it, are perma- 
nent oculomotor habits, unconnected with 


15 


perception and assimilation of the reading 
content. 

Tinker, in a study reported in 1938, 
photographed the eye movements of 64 
university sophomores in an attempt 
(a) to measure the accuracy of oculomotor 
control as manifested by the accuracy of 
voluntary fixation during successive per- 
ceptual acts, (b) to measure the speed of 
convergence and divergence during suc- 
cessive visual fixations, and (c) to deter- 
mine the relation of these measures to 
speed of reading and eye movement meas- 
ures of reading. He administered the Min- 
nesota Speed of Reading Test and photo- 
graphed the eye movements during the 
reading of easy prose. In addition, he 
photographed successive fixations which 
the Ss made while reading numbers lo- 
cated at the ends of blank horizontal lines 
4V$ inches long and 6 inches long. 

Tinker concluded that for these Ss as a 
group, neither convergence, divergence, 
nor oculomotor efficiency in fixating num- 
bers at the end of a line-length sweep of 
the eyes bore any significant relation to 
proficiency in reading, but that when ex- 
treme cases in the group were compared 
there appeared to be a slight, consistent re- 
lation between motor efficiency of the eyes 
and reading ability. However, this study 
is limited by the fact that oculomotor con- 
trol was measured by having the Ss fixate 
numbers at the ends of long lines. The only 
similar saccadic movement in normal read- 
ing is the long sweep of the eyes from the 
end of one line to the beginning of the next 
line. These fixations represent a small 
fraction of the total number of fixations 
used in reading. 

Some years ago the writer of this article 
studied the oculomotor growth of college 


16 LUTHER C. 
students and of Ss from Grades I through 
IX (Gilbert, 1953). Oculomotor ability 
was measured by eye movement records of 
the Ss while they tried to fixate single dig- 
its distributed at unequal distances along 
four-inch blank horizontal lines. The evi- 
dence from these records suggests that 
there is a growth in this type of oculo- 
motor ability, with the greatest amount 
taking place in the primary grades. Effi- 
ciency in fixating these single digits is 
positively associated with the rate of read- 
ing simple prose. 

The underlying causal factor, or factors, 
in the positive association of fixation 
pauses in visually fixating these single nu- 
merals and fixation pauses used in reading 
easy prose has not been clearly identified. 
Some neurologists (Nielsen, 1951) hold to 
the theory that the mental processing of 
the visual stimulus of the number 9 in- 
volves a different part of the brain from 
that used in processing the visual stimulus 
of the word nine. Until this theory is care- 
fully evaluated, caution should be exercised 
in interpreting these as identical mental 
functions. Certainly, the fixating of a 
single numeral (following a saccadic move- 
ment) as a method of studying oculomotor 
efficiency gives an indirect and uncertain 
measure of the influence of saccadic move- 
ments in reading easy prose. 

In a more recent study (Gilbert, 1959) 
the writer presented evidence pointing up 
the wide range of individual differences 
among college students in the speed with 
which they process the visual stimuli of 
phrases and sentences of sense material 
when the eyes are functioning without the 
necessity of making saccadic movements 
immediately preceding or following the 
flashing of the sense material. A positive 
correlation between the speed and accu- 
racy of processing the sense material and 
the rate of reading easy prose material 
supports the generally accepted theory 
that the effectiveness of the processing 
skills exercises a profound influence upon 


GILBERT 


reading. However, there remains the ques- 
tion, “Is this the whole story? Is it pos- 
sible that the saccadic movements may 
exercise an uneven influence on the speed 
of reading easy prose?” This question is 
not answered in the present body of litera- 
ture. 

The foregoing studies in oculomotor ef- 
ficiency differ from the present one in two 
important respects: (a) the material and 
procedure used, and (b) the techniques 
and analysis. 


SUBJECTS 


The Ss included 76 college juniors, sen- 
iors, and graduates, who were members of 
classes in educational psychology. Of the 
76 Ss, 29 were men and 47 women. For all 
of them, the native language was English. 

The rate of reading on a standard level, 
nonfiction and nontechnical article of ap- 
proximately 2,300 words ranged from 126 
words per minute to 602 words per minute. 
The average for the group was 263 words 
per minute with a standard deviation of 
70.80. 

The vision of the group was tested by 
asking each S if he or she could see clearly 
the words as they appeared on the screen. 
Any who reported that the words blurred 
were eliminated from the study. Only 8 
Ss were lost for this reason. A further 
check on the vision was made by flashing 
two-word phrases (all simple, short words) 
on the screen for 1/24 of a second without - 
a competing stimulus following. The 76 5$ 
reported correctly 97.5% of the words. It 
seems reasonably certain that neither de- 
fective vision nor lack of familiarity with 
the simple words used in the sentences was 
an important factor in the results. 


MATERIALS 


Two tests (I and III) were designed tO 
measure speed and accuracy of visual pet 
ception while reading simple prose sen 
tences without the necessity of moving the 
eyes. Test I was composed of 20 sentences: 


SACCADIC MOVEMENTS IN PERCEPTION IN READING 17 


each eight words in length—a total of 160 
words. In order to avoid as far as possible 
the influence of unfamiliarity with the 
vocabulary and sentence structure, the ma- 
terial was written in simple sentences, 
using only short and thoroughly familiar 
words. Each sentence was an independent 
unit of thought. The material was typed 
on an IBM typewriter using Warren high 
gloss paper. Each sentence was broken 
into four phrases of two words each. In 
order to facilitate photographing of the 
material each pair of words was typed 
on 4 separate page. 

To prevent the necessity of moving the 
eyes, three small dots were photographed 
on eight frames of 16 mm. film. These dots 
indicated to the S where the words would 
appear. Two blank frames of the same il- 
lumination followed the dots to signal the 
Appearance of the words. Each pair of 
words appeared at the designated spot on 
the screen and remained on the screen for 
4/24 of a second. Each unit was separated 
from the others by 2/24 of a second. At the 
completion of each Sentence, time was al- 
lowed for the Ss to write down what they 
saw. The score was one point for each 
word correctly reported. A perfect score 
was 160 points. 


ExAMPLE or Test I 


The three dots (...) were projected on 
the center of the screen to indicate where 
each of the four units, or word pairs would 
be presented. For example, the word pairs, 
two boys, will visit, our home, next week, 
were all projected to the same position on 
the screen. Each pair of words was exposed 


Tor 4/24 of a second and followed by 2/24 
of aesecond of blank screen, 


EXAMPLE or Test II 


: With one major exception Test II was 
Similar to Test I. In Test II the words 
were presented along what might be 
thought of as a line of print. In order to 
Tead the sentence the S had to cover what 


was equivalent to a nine-word line. The 
time intervals were the same. Each pair of 
words was exposed for 4/24 of a second 
and followed by 2/24 of a second of blank 
screen. The following is an example of the 
items in this test: 


*Our school*will have*open house*this year" 
(*separated by three letter spaces) 


In Tests II and IV, the dots were used 
to indicate when and where the first pair 
of words would appear. Since the positions 
on the sereen where the phrases were pre- 
sented were fixed, no dots were used for 
the other phrases. 

"Test III was similar to Test I except for 
the fact that the words were left on the 
sereen for 2/24 of a second. Test IV fol- 
lowed the design of Test II, except for the 
fact that the words were exposed for 2/24 
of a second. 


FINDINGS OF THE INVESTIGATION 


Data concerning the influence of sac- 
cadic movements of the eyes on speed and 
accuracy of visual perception in reading 
simple prose sentences for the experimen- 
tal group of 76 cases are presented in the 
first column of Table 1. These data show 
that when the sentences were presented 
two words at a time in a manner which re- 
quired saccadic movements and each pair 
of words was exposed for 2/24 of a second 
the group reported accurately 64% of the 
material. When comparable material was 
read at the same speed without eye move- 
ments the group averaged 92% correct. 
Reading at this speed, the difference be- 
tween the mean percentage of accuracy 
with and without movements was nearly 
28. When the material was left on the 
sereen for twice as long, (4/24) the differ- 
ence between the mean percentages was 
about 12. The £ ratio between the loss at 
the 2/24 level and the loss at the 4/24 level 
is 8.88, which is significant at better than 
.001 level. It seems that the closer the 
length of exposure time for the word pairs 


18 LUTHER C. GILBERT 


TABLE 1 


INFLUENCE OF SACCADIC MOVEMENTS OF THE EYES ON SPEED AND ACCURACY 
or VISUAL PERCEPTION IN READING SIMPLE PROSE SENTENCES 


= 
Time Units 2-2-2 Time Units 4-2-4 
Total Total 
51^ group |Si^ group Sı group | $4 group 
group group 
76 Ss 19 Ss 19 Ss 76 Ss 19 Ss 19 Ss 
Mean Percentage of Words | 92.03 97.93 84.61 96.32 99.28 93.16 
Correctly ^ Reported—No 
Saccadic Movement 
SD 10.55 3.00 12.97 5.10 1.57 6.39 
Mean Percentage of Words | 64.35 79.80 52.57 84.56 93.98 75.14 
Correctly Reported—Sacca- 
dic Movement 
SD 20.82 16.61 23.49 16.39 7.88 21.78 
Difference Between Means— | 27.68 18.13 32.04 11.76 5.30 18.02 
Saecadic Movement and no 
Saccadic Movement 
* 10.27 4.57 6.32 5.94 2.82 3.37 


2 Best 25% of the readers. 
b Poorest 25% of the readers. 


* All the ¢ ratios are significant at better than the 1% level. 


approaches the average duration of the 
fixation pauses used in reading simple 
prose the less is the loss in visual percep- 
tion which is associated with saccadic 
movements. 

For the group as a whole the number of 
errors in visual perception associated with 
saccadic movements is decreased as the 
length of the exposure is increased. Is this 
equally true for both good and poor read- 
ers? In order to check this point the 76 
cases were ranked on gross reading rate 
from the fastest reader to the slowest 
reader. The best 25% of the readers was 
compared with the poorest 25%. The best 
group was designated as S, and the group 
of the poorest readers was designated as 
S,. The data in Table 1 show a number of 
interesting things. Obviously, both good 
and poor readers make more perceptual 
errors at both levels of speed when they 
read with eye movements than they do 
when reading without movements. The 
saccadic movements are associated with a 


substantially greater loss in visual percep- 
tion for the poor readers at both the 2/24 
and 4/24 levels than they are for good 
readers. The ¢ ratios are 2.784 for the 2/24 
level and 2.917 for the 4/24 level. Both are 
significant at better than the .01 level. Of 
special interest is the fact that both good 
and poor readers make fewer perceptual 
errors reading at the 2/24 level without 
saccadic movements than they do reading 
comparable material at the 4/24 level with 
saccadic movements. Another point of 
special interest is the evidence that both 
S, and S, readers can process this material 
mentally at a faster rate and more accu- 
rately than they actually do when reading 
with saccadic movements. i 
From these data one cannot be certal? 
just why visual perception is so mue” 
better when the eyes function without $207 
cadic movements than it is following $30 
cadic movements. Neither is it clear why 
some Ss suffer a much greater loss in VI^ 
ual perception following saccadic move 


SACCADIC MOVEMENTS IN PERCEPTION IN READING 19 


ments than is true for other Ss. Contrary 
to traditional theory, it appears quite pos- 
sible that a very real part of this phe- 
nomenon is due to individual differences in 
functional motor efficiency of the eyes. 
Part may be due to individual differences 
in speed and accuracy for processing visual 
stimuli. There remains also the possibility 
that these factors are interacting. 


SuMMARY oF Frypines 


The findings of the investigation may be 
summarized as follows: 

1. These data indicate that both good 
and poor readers make more perceptual 
errors when they read with eye movements 
than they do when reading without move- 
ments. 

2. Both good and poor readers make 
fewer perceptual errors reading at the 2/24 
second level without saccadic movements 
than they do reading at the 4/24-second 
level with saceadie movements. 

3. Saccadic movements are associated 
with a substantially greater loss in visual 
perception for the poor readers than they 
are for the good readers, 


4. Slow readers seem to need, as a 
group, a longer interval of time to stabilize 
their fixations to a point of maximum ef- 
ficiency than is necessary for the fast 
readers. 

5. Both good and poor readers can proc- 
ess simple prose material mentally at a 
faster rate and more accurately than they 
actually do when reading with saccadic 
movements. 


REFERENCES 


Gutzert, L. C. Functional motor efficiency of 
the eyes and its relation to reading. Uni- 
ver. Calif. Publ. Educ., 1953, 11(3), 159- 
232. 

GILBERT, L. C. Speed of processing visual 
stimuli and its relation to reading. J. 
educ. Psychol., 1959, 50, 8-14. 

Nretsen, J. M. Clinical neurology, Paul B. 
Hoeber, 1951. 

Tinker, M. A. Motor efficiency of the eye 
as a factor in reading. J. educ. Psychol., 
1938, 29, 167-174. 

Vernon, M. D. The movements of the eyes 
in reading. London: Med. Res. Council, 
Spec. Ser. No. 48, 1930. 


Received May 30, 1958. 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 1, 1959 


THE RELATIONSHIPS BETWEEN VARIOUS FORMS OF 
AGGRESSION AND POPULARITY AMONG 
LOWER-CLASS CHILDREN 
GERALD S. LESSER 
Educational Clinic, Hunter College 


In the study of socialization, there has 
been considerable emphasis upon the re- 
actions of parents (Davis, 1944; Sears, 
Whiting, Nowlis, & Sears, 1953; Whiting 
& Child, 1953) and teachers (Appel, 
1942; Jersild & Markey, 1935; Levin, 
1955) to the aggressive behavior of chil- 
dren. However, only a limited amount of 
descriptive and normative information 
(Cunningham et al., 1951; MacRae, 1954; 
Piaget, 1932; Tuddenham, 1951) exists 
concerning the exact nature of the re- 
actions of the child’s peers to his aggres- 
sive behavior. In general, there has been 
little empirical specification of the social 
behaviors which are rewarded and pun- 
ished by the peer group. 

The nature of the peer group’s influence 
as socializing agents depends upon a large 
variety of variables. The application of 
rewards and punishments by the peer 
group differ greatly for different drive 
areas (e.g., sex, achievement, dependency, 
aggression, etc.). Also, the use of rewards 
and punishments by peers in any single 
drive area, such as aggression, undoubt- 
edly varies with the age, sex, and social 
class position of the children, among other 
variables. Social class influence upon re- 
actions to aggression has been especially 
emphasized. Davis’ (1944) report that 
aggression among lower-class children is 
learned as a socially rewarded form of 
behavior has received considerable atten- 
tion. 

However, the conclusion that an over-all 
positive relationship exists in lower-class 
groups between aggression and cultural 
approval appears to be an over-simplifica- 
tion. Since aggression is not a simple, 
unidimensional variable, it is plausible 


to assume that the acceptability of the 
aggressive responses to a lower-class peer 
group will depend upon the precise form 
of the aggression manifested, Which ag- 
gressive responses are accepted and posi- 
tively reinforced by peers in lower-class 
groups and which aggressive responses 
are rejected and negatively reinforced by 
peers in lower-class groups? This question 
indicates the focus of the present study- 

There is clear evidence (Cunningham 
et al., 1951; Murphy, 1937) that groups 
of young children are capable of making 
very fine discriminations among the spe- 
cific responses (even within a single drive 
area) of their age-mates; these responses 
are then differentially rewarded or pun- 
ished by the peer group's behavior. Five 
different, manifestations of aggressive be- 
havior were defined in this study which, 
on a priori grounds, appeared to be con- 
ceptually independent. The hypothesis 
proposed here is that these different 
manifestations of aggression meet with 
different degrees of approval and dis- 
approval by the lower-class peer group. 


METHOD 


The Ss were 74 white boys (ages 10-0 
to 13-4) drawn from three fifth grade 
and two sixth grade classes in three pub; 
lic schools in New Haven, Connecticut: 
The Kuhlmann-Anderson intelligence quo^ 
tients of these boys ranged from 78 t0 
125, with a mean of 103. The three schools 
are in adjacent districts and the families 


* The writer wishes to thank Esther Viet® 
J. Allen Hickerson, May White, John We 
solowski, Margaret Fitzsimons, and the 
teachers of Barnard, Scranton, and Roget 
Sherman schools in New Haven, Conn. fo 
their helpful cooperation. 


AGGRESSION AND POPULARITY 21 


of the children constitute a relatively 
homogeneous upper lower-class group, ac- 
cording to the Index of Status Character- 
istics criteria of occupation, source of in- 
come, house type, and dwelling area 
(Warner, Meeker, & Eells, 1949, pp. 121- 
230). The town of New Haven had been 
Previously mapped for social class loca- 
tion;? this mapping confirmed the social 
class placement of the families of the 
children in this study. 

The variable of popularity was meas- 
ured through a standard sociometrie meas- 
ure. The children were asked to “List 
the 3 boys in your class that you would 
like to have for your best friends.” and 
to “List the 3 boys in your class that 
you wish were not in your class at all.” 
A popularity score was obtained for each 
S by subtracting the number of times he 
Was mentioned for the second (“enemies” 
question from the number of times he 
was mentioned for the first (“friends”) 
question. The Spearman rank-difference 
correlation coefficients (rho) between the 
boys' Popularity choices and girls’ pop- 
ularity choices ranged from +-.74 to +.24 
for the different classes; the boys’ and 
girls’ popularity choices were therefore 
combined to provide a single measure of 
popularity for each S. 

The measures of overt expression of 
aggression were obtained through a modi- 
fied sociometrie device, the “Guess Who” 
technique. Each child in the class re- 
ceived a booklet containing a series of 
written descriptions of children and was 
instructed to identify each of these de- 
Seriptive characterizations by naming one 
or more classmates. Both the boys and 
girls in each class completed this booklet ; 
however, all Ss were instrueted to list 


Hao map of residential areas of New 
aven, Connecticut, 1950, was compiled by 

x Ens under the auspices of 

263, principal investi- 
gators A, p. Hollingshead and F. C. Redlich, 


ale Universit; : 
Myers, tsity. Copyrighted by Jerome K. 


only the names of boys in the class in 
answering the items. 

The following instructions appeared 
upon the cover of the booklet and were 
also read to the children by the experi- 
menter: 


In this booklet are some word pictures 
of boys in your class. Read each one and 
write down the names of the boys whom 
you think the picture fits. 


Remember: 


Any one picture may fit more than one 
boy. You may write as many names as you 
think belong under each picture. 

The same boy can be mentioned for 
more than one picture. 

You will have as much time as you 
need to finish. Do not hurry. 


Before beginning the task, the children 
were also told that only the experimenter 
(and not the teacher) would see their 
entries in the booklets and that the task 
had nothing whatever to do with their 
School grades. The children were not asked 
to put their names on the booklets until 
the end of the session. 

The definitions of the five categories 
of aggressive activity, and the sociometric 
items used in each category are as follows: 


1. Provoked Physical Aggression : i 

Definition: to physically attack or injure 

after provocation. 

Items: 

a. Here is a boy who will fight, but only 
if someone picks on him first. 

b. Here is someone who will always fight 
back if you hit him first. 

2. Outburst Aggression: 

Definition: to display uncontrolled, “tem- 

per tantrum” aggressive behavior. 

Items: 

a. Here is a boy who gets so mad at 
times that he doesn’t know what he is 
doing. 

b. Here is someone who flies off the 
handle right away and is very hot- 
headed. 

€. This boy gets very, very mad at times. 

3. Unprovoked Physical Aggression : N 

Definition: to physically attack or injure 

without provocation. 


22 


Items: : 
a. This boy starts a fight over nothing. 
b. Here is someone who is always looking 
for a fight. 
c. Here is a boy who gets mad while he 
is playing and ends up in a fight. 
4, Verbal Aggression: 
Definition: to verbally attack or injure. 
Items: 
a. This boy often threatens other boys. 
b. This boy is always scolding when he is 
playing a game with other boys. 
c. This boy makes up stories and lies 
about other children in the class, 
5. Indirect Aggression: 
Definition: to attack or injure indirectly 
through another person or object. 
Items: 


a. This boy tattles to the teacher about 
what other boys do. 


b. Here is a boy who breaks things that 
belong to others. 


In addition to the 13 items used to 
measure the various forms of aggression, 
the booklet contained seven filler items. 
The practical consideration of limited at- 
tention span to a written task in this age 
group necessitated the use of only a small 
sample of items for each aggression cate- 
gory. 

A different form of question was used 
for two additional items in the booklet 
which made it possible to estimate the 
children’s understanding of the “Guess 
Who” form of question. The children were 
asked to “List the 3 boys in your class 
who fight the most with other boys” and 
to “List the 3 boys in your class who say 
mean things, and threaten other boys the 
most.” For the two different forms of 
item measuring Unprovoked Physical Ag- 


GERALD S8. LESSER 


gression and Verbal Aggression, respec- 
tively, the average tetrachoric correlation 
coefficients for the five classroom groups 
were +.90 and 4-.88. 

An overt aggression score was obtained 
for each S in each of the five aggression 
categories by counting the number of 
times his name was noted by both the 
boys and girls in the class. The Spearman 
rank-differenee correlation coefficients 
(rho) between the boys' entries and the 
girls’ entries for the five aggression meas- 
ures ranged from +.87 to +.56; the en- 
tries were therefore combined to provide 
the five measures of aggression for each S. 

The intercorrelations (the average rhos 
for the five classroom groups) among 28° 
gression variables are presented in Table 
1. These correlations are largely positive 
in direction. 

The aggression scores were derived from 
responses to verbally described classes 0 
behavior and not from directly observed 
behavior. It is therefore possible that the 
children in making their judgments of 
each others’ aggression were responding t? 
the connotations of the verbal descriptions 
rather than on the basis of their actual 
observations of the behaviors described 9! 
the items. However, there were highly $!£ 
nificant correlations between peer grouP 
ratings and teacher observations of the a8- 
gressive behaviors; biserial correlatioP 
coefficients between peer and teacher 
judgments for the five aggression variable? 
ranged from +.80 to +.72 (p < 01 in all 
cases). There was therefore strong ev” 


dence that teachers and peers respond in d 


TABLE 1 
INTERCORRELATIONS AMONG AGGRESSION MEASURES 


Provoked 


Physi Outburst Unprovoked verha} Indirect 
Acron Aggression o3 Agression Agpressio 
Provoked Physical Aggression 
Outburst Aggression --.36 
Unprovoked Physical Aggression 25.98 +.63 
Verbal Aggression +.27 +.73 4.73 
Indirect Aggression —.91 +.36 4.43 4.57 


Í— ————— 


AGGRESSION AND POPULARITY 


23 


TABLE 2 
CORRELATIONS BETWEEN AGGRESSION SCORES AND THE POPULARITY SCORE 
SIE 
School A | School B | School B | School C | School C ean 
de 3 Grade ó | for All 
AS) | AS i) dedo | G mi | aS) ED 
Provoked Physical Aggression vs. | --.47* +.42 | +.57* +.10 —.02 +.31 
Popularity 
Outburst Aggression vs. Popu- | —.43* +.07 +.12 —.60** | —.23 .21 
larity ! T d m 
Unprovoked Physical Aggression | —.45* | —.22 | +.05 —.62 —.57 a 
vs. Popularity . E 
Verbal Aggression vs. Popularity | —.48* | —.34 | —.02 — 65** — por = = 
Indirect Aggression vs. Popular- | —.55** | —.56* | —.81** | —.68 —.86 . 
ity 
Over-all Mean for AllAggression- | —.29 —.13 | —.02 —.49 —.49 — 28 
Categories 


"P <05. 
^v pee d 


common to the observed aggressive be- 
haviors. 


RzsurTs 


The Spearman rank-difference correla- 
tion coefficients (rho) between the five ag- 
gression variables and popularity are pre- 
sented in Table 2 for each class separately. 

The most striking feature of these re- 
sults is the absolute consistency among all 
five classes in the progressively more nega- 
tive correlations as the relationships be- 
tween popularity and agression measures 
are considered in the following order: pro- 
voked physical aggression, outburst ag- 
gression, unprovoked physical aggression, 
verbal aggression, and indirect aggression. 
In each class, the correlation between pro- 
voked physical aggression is the most posi- 
tive of the correlations for that class and 
the correlation between indirect aggression 
and popularity the most negative, with 
the correlations for outburst aggression, 
unprovoked physical aggression, and ver- 
bal aggression ordered in between as in- 
creasingly negative. 

When all five aggression variables are 
considered simultaneously, the over-all re- 


lationship between aggression and popu- 
larity is negative for all classes. However, 
there are considerable differences among 
the classes in this respect. For example, 
the over-all mean correlation for all ag- 
gression categories for School B, Grade 6 
is very nearly zero, the positive correla- 
tions for provoked physical aggression, 
outburst aggression, and unprovoked phys- 
ical aggression balancing the negative cor- 
relations for verbal aggression and indirect 
aggression. In contrast, School C, Grade 
6 manifests negative correlations for all 
aggression variables. Regardless of the 
size of the negative over-all relationship 
between aggression and popularity, how- 
ever, the same sequence of progressively 
increasing negative correlations is main- 
tained with absolute consistency for all 
classes. 


Discussion 


This study presents evidence that, 
within the sample of Ss employed, there 
are clear differences in the lower-class peer 
group’s reaction to different manifesta- 
tions of aggressive behavior on the part of 
its members. Retaliation to an aggressive 


24 


attack is a relatively approved form of 
behavior and verbal aggression and in- 
direct aggressive acts are strongly dis- 
approved among a sample of preadolescent 
upper lower-class boys. However, _these 
findings would probably not be duplicated 
for samples which differ in sex, age, and 
social class position. It would be of great 
value for the study of socialization to ex- 
tend the knowledge of the parameters of 
peer group influence by utilizing a wide 
variety of samples differing, at least, for 
sex, age, socal class, religion, and geo- 
graphic location. 

- The present sample occupies an upper 
lower-class position. The present finding 
that there is a generally negative relation- 
ship (although more negative for some 
forms of aggression than for others) be- 
tween aggression and popularity among 
these upper lower-class boys is somewhat 
surprising in view of the statements and/or 
assumptions of previous studies (e.g. 
Davis, 1944; Mussen & Naylor, 1956). 
These findings qualify Davis’ (1944) ob- 
servation that “The lower classes not un- 
commonly teach their children and ado- 
lescents to strike out with fist or knife and 
to be certain to hit first.... The impor- 
tant consideration with regard to lower- 
class adolescents is that it is learned as an 
approved and socially rewarded form of 
behavior in their culture.” (Davis, 1944, 
p. 209). Davis also remarks that even 
when lower-class parents discourage ag- 
gression, “...the power of the street cul- 
ture in which the child and adolescent are 
trained overwhelms the parental verbal 
instruction. The rewards of gang pres- 
tige...seem to be on the side of the 
street culture.” (Davis, 1944, p. 210). The 
results of the present study suggest not 
only that there are upper lower-class peer 
cultures that do not approve aggression 
in general but also that very precisely 
discriminated kinds of aggression are dif- 
ferentially rewarded or punished by up- 
per lower-class children. 


GERALD S. LESSER 


It is apparent, of course, that these re- 
sults can not be generalized to peer be- 
havior beyond the confines of the school. It 
is possible that there would be a greater 
resemblance between the results of this 
study and the observations of other in- 
vestigators if measures of lower-class peer 
influence outside of the school had been 
obtained. ] 

One additional point may be noted with 
regard to the ordering of the correlations 
between popularity and the various forms 
of aggression. For all five classes, outburst 
aggression is the second least disapproved 
form of aggression, while verbal aggression 
and indirect aggression are the most 
strongly disapproved by peers. This or- 
dering reverses what appears experientially 
to be the response of adults to these forms 
of aggression in 10- to 13-year old boys- 
Teachers, for example, view outburst, un- 
controlled, temper tantrum behavior 5 
especially undesirable in a 10-13 year old 
boy, while verbal and indirect forms of 
aggression are frequently permitted or en- 
couraged by teachers as displaced, less 
destructive outlets for aggressive feelings. 
However, the reactions of peers to thes? 
aggressive responses is exactly the reverse: 
Outburst aggression is much less associated .. 
with a child’s unpopularity with his peers 
than verbal and indirect aggression. 


SUMMARY 


This study specifies the manner in which 
the acceptability of aggressive responses 
to a lower-class, preadolescent’s peer 
group varies with the precise form of a£- 
gression manifested. 

For five classroom groups of uppe" 
lower-class, preadolescent boys, five dif- 
ferent forms of the overt display of a8- 
gression were measured. The correlation? 


. "Beginning with Wickman’s (1928) ong 
inal study, a series of investigations i 
teachers’ attitudes has consistently ranke 
"temper tantrums” among the most serioU? 
behavior problems in children. 


| 


! 


AGGRESSION AND POPULARITY 25 


between the various forms of aggression 
and popularity within the peer group in- 
dieates that among preadolescent boys 
Provoked Physical Aggression is relatively 
approved behavior, Outburst Aggression, 
Unprovoked Physical Aggression, and 
Verbal Aggression are progressively more 
disapproved, and Indirect Aggression is 
strongly disapproved. 


REFERENCES 


APPEL, M. H, Aggressive behavior of nursery 

School children and adult procedures in 

enling with such behavior. J. ezp. 
Educ., 1942, 11, 185-199. 

NNINGHAM, RuTH ET AL. Understanding 
group behavior of boys and girls. New 

ork: Teachers College, Columbia Uni- 

ver., Bureau of Publications, 1951. 

VIS, A. Socialization and adolescent per- 
sonality. Adolescence, Forty-third Year- 
book, Part I. Nat. Soc. for Study of 
oe Chicago: Univer, Chicago Press, 

JERSIDD, A, T., & Markey, F. V. Conflicts 

between preschool children. Child De- 
velopm. Monogr., No. 91. New York: 
im College, Columbia Univer., 

Levin, H. The influence of classroom control 

on kindergarten children’s fantasy ag- 


Cu: 


Da 


gression. Elem. Sch. J., 1955, 55, 462- 
466. 

MacRag, D. A test of Piaget's theories of 
mental development. J. abnorm. soc. 
Psychol., 1954, 49, 14-18. 

Murray, Lors B. Social behavior and child 
personality. New York: Columbia Uni- 
ver. Press, 1937. 

Mussen, P. H., & Naxron, H. K. The rela- 
tionships between overt and fantasy ag- 
gression. J. abnorm. soc. Psychol., 1954, 
49, 235-240. 

Pracer, J. The moral judgment of the child. 
London: Kegan Paul, 1932. 

Sears, R. R., Warme, J. W. M., Nowrzs, 
V. & Sears, Paure S. Some child- 
rearing antecedents of aggression and 
dependency in young children. Genet. 
Psychol. Monogr., 1953, 47, 135-234. 

Tuppenuan, R. D. Studies in reputation III. 
Correlates of popularity among ele- 
mentary-school children. J. educ. Psy- 
chol., 1951, 42, 257-276. 

Warner, W. L., MEEKER, Marcana, & EerLLs, 
K. Social- class in America. Chicago: 
Science Research Associates, 1949. 

Wama, J. W. M, & Cum, I. L. Child 
training and personality: a cross-cul- 
tural study. New Haven: Yale Univer. 
Press, 1953. 

Wickwaw, E. K. Children's behavior and 
teachers’ attitudes. New York: The 
Commonwealth Fund, 1928. 


Received July 8, 1958. 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 1, 1959 


SOME RELATIONS BETWEEN TEST SCORES AND 
ITEM STATISTICS 


FRANCES SWINEFORD 
Educational Testing Service 


That there are definite relationships 
between score statistics for a group of 
examinces and item statistics based on 
data for the same group is well known. 
The most obvious one is that when the 
score is the number of correct responses 
the mean score is equivalent to the sum 
of the proportions of examinees who answer 
the items correctly. Gulliksen (1945) has 
proved that test reliability and score vari- 
ance both increase (a) as the average inter- 
item correlation increases and (b) as the 
variance of the item difficulty distribution 
decreases. His presentation was concerned 
with theoretical relationships leading to 
a set of general theorems. 

The present study was undertaken in 
the belief that simple empirical relation- 
ships which would serve to complement 
Gulliksen’s theoretical formulas could be 
found. The results shed considerable light 
on the way in which item statistics and 
test statistics are likely to be related in 
any new test. Thus, reasonable estimates 
of item statistics can yield worthwhile 
predictions of score statistics. There will 
be presented regression equations and 
corresponding tables which can be used 
to estimate the score standard deviation, 
test reliability, and mean item-test correla- 
tion all within certain limits to be pointed 
out later. 


The Variables 


Two test-score statistics of particular 
interest are the standard deviation and the 
reliability. The score standard deviation, 
c: , is expressed in terms of score units, 
which vary from test to test. In order to 
eliminate this unit of measurement we 
have selected as our first variable the ratio, 
(n — chance)/c, , where n is the number 


of items in the test and chance is the ex- 
pected mean score if all answer sheets were 
marked at random. This ratio is a pure 
number. Although it does vary with test 
length, the variation is not great for 
reliable tests when the longest is no more 
than eight or ten times the length of the 
shortest. Such variation ean be ignored 
for practical purposes. 

The second variable is an estimate of test 
reliability. The reliability of the tests used 
in this study was computed by the Kuder- 
Richardson Formula [20] (1937, p. 158) for 
all tests for which Score = R and by the 
Dressel (1940, p. 309) adaptation of this 
formula for all tests for which Score = R 7 
kW. In these formulas R represents the 
number of items which an examinee answers 
correctly, and W, the number answered in- 
correctly. As in the case of the standard de- 
viation, it is necessary to make an adjust- 
ment in order to render coefficients fO" 
different tests comparable in some sense 
Other things equal, long tests are more T& 
liable than short tests. It was decided arbi- 
trarily to adjust each observed reliabilitY 
coefficient by means of the Spearma?" 
Brown prophecy formula so that it woul 
describe a 100-item test. This adjusted co 
efficient will be denoted rel.jo . 

The third variable, one of two ite? 
statistics to be used in the study, 3$ 
measure of the variability of item difficul ity 
indices. The difficulty index used at Educ; 
tional Testing Service, known as “gelta 
is a normal-curve deviate above which | 
lies the area under the curve equal to t” 
proportion of successful examinees. It 
expressed in terms of a scale with mean s p 
13 and standard deviation of 4. Thus; 1° 
example, if 50 per cent of the examine 
who attempt an item answer it corre? 


íf 


| 


TEST SCORES AND ITEM STATISTICS 27 


the corresponding delta is 13.0; if 60 per 
cent are correct, the delta is 12.0; and if 
70 per cent are correct, the delta is 10.9. 
Delta is not linearly related to p, the 
proportion of correct responses to an item. 
It is used, however, because, for reasons 
not relevant to this study, the difficulties 
of ETS test items are recorded in terms 
of delta. The third variable, then, is the 
standard deviation of the n deltas, oa . 

The fourth variable is the reciprocal of 
the square of the mean of the item-test 
correlations, 1/7yi2, which for simplicity 
Will be denoted 1/7, C. T. Fan has shown 
that under certain conditions the relation 
between the Kuder-Richardson Formula 
[20] reliability and the funetion, 1/7, is 
essentially linear (Swineford, 1957). Item- 
test correlations are preferred to interitem 
correlations because they are more readily 
obtained. Ina test of items there are only 
n item-test correlations, whereas there are 
n(n — 1) interitem correlations. The inclu- 
sion of the item itself in the total score with 
which the item score is correlated produces 
in the correlation a spurious element, which 
is relatively higher for a short test than for 
a long one. Correcting each item-test cor- 
relation to eliminate this spurious element, 
however, is considered not worth the time 
and effort it would require. 


The Populations 


The populations, two in number, are 
multiple-choice tests. Tests scored by the 
formula, Score = R, comprise one popula- 
tion, and those scored by the formula, 
Score = R— kW, comprise the other. If the 
test is composed of K-choice items, k is 
CREE l/(K — 1). In some instances a 

test” as used here is a separately scored 
and analyzed Subtest from a larger ex- 
amination. Each of the populations will be 
handled separately through a sample of 
tests administered by Educational Test- 
ing Service in such Programs as College 
Entrance Examination Board, Graduate 
Record Examinations, Law School Admis- 


sion Test, National Board of Medical Ex- 
aminers, and others. The data were taken 
from ETS’ files. An effort was made to 
include in each sample a broad range of 
each variable. Greater success in this regard 
was achieved for Sample A, the “R” 
tests, than for Sample B, the “R — kW” 
tests. The tests vary in length from 15 to 
260 items, with the majority within the 
range, 35 to 175. 


The Data 


The basic statistics are the means, stand- 
ard deviations, and intercorrelations of the 
four variables. These are presented in Table 
1. It should be noted that for Sample B 
Variable 1 becomes n/c; when every 
item has 1+1/k choices. 

Variables 1, 2, and 4—those involving 
the score standard deviation, the test 
reliability and the item-test correlations— 
are interrelated to an extremely high 
degree. All the correlations for Sample B 
are lower than corresponding values for 
Sample A, as is consistent with the lower 
standard deviations. These differences are 
probably not attributable to the difference 
in the scoring procedure but principally 
to the fact that it was not possible to find 
in the files a wide representation of the 
four variables among tests scored R — kW. 

The data of Table 1 will be used to 
compute regression equations for predicting 
the score standard deviation and the test 
reliability. It should be understood that the 
beta coefficients in the regression equations 
are not unique. For example, ETS-pro- 
duced tests usually yield unimodal delta 
distributions centered near middle difficulty 
for appropriate examinee groups. The 
inclusion of tests with extremely lepto- 
kurtic or platykurtie or U-shaped delta 
distributions would undoubtedly have 
been reflected in the beta-coefficients. 

Since it is sometimes convenient to be 
able to estimate the mean item-test correla- 
tion, given the score standard deviation 
and the test reliability, regression equations 


28 FRANCES SWINEFORD 
TABLE 1 
Basic DATA ror Two Test SAMPLES 
. Standard Intercorrelations 
Variable Mean Deviation 1 | 2 3 4 
Sample A: Score — R (74 Tests) 

1. 6.2891 | 2.6847 — .9702 -6739 | 
2. 0.8650 | 0.1025 | —.9702 — .5859 e 
n 2.2959 0.4391 .6739 — .5859 $ 
4. 7.8250 5.4344 .9570 —.9595 6023 

Sample B: Score = R — kW (54 Tests) 
Le: (nob) et. ue oe 5.1917 | 1.0171 — .8652 .2929 ue 
Zi TOs Od ites wie vs aeria 0.9166 0.0379 — .8652 — .0157 —.9 30 
Idi GA Varii I one 2.0778 0.3089 .2929 — .0157 .08 
SNL TAE A seiorobns 4.8000 1.8585 .9408 — .9297 .0880 
for this purpose will also be computed Sample B (Score = R — kW) 


for both samples. 


Prediction of Score Standard Deviation 


The correlations show that the item-test 
correlations are far more closely related 
to the score standard deviation than is the 
variability of the item difficulty indices. 
The multiple regression equations are even 
more striking. In the equations which 
follow, z will be used to denote a variable 
in standard form, X will be used for the 
Taw-score form, and a prime will denote a 
predicted value. The symbol, Ri, 
denotes the multiple correlation between 
X; on the one hand and X, predicted 
from X; and X, on the other. 

The formulas in Standard-score and in 
raw-score form for predicting Variable 1 = 
(n — ch)/sc, from Variable 3 = 


= c, and 
Variable 4 = 1/7? are as follows: 
Sample A (Score = R) 
z’ = .1530 z; + .8649 z; ; [1] 


xy 


D] 


.9353 X; + 4273 X, 
+ 47988; 


Rian = 9048; SE. = 0.706. 


[2] 


mi’ = 2117 za + 9222 z ; (3) 
Xi! = 6972 X, + .5047 X, 4 
+ 1.3206; | 
Ria = -9642; SE... = 0.270. 


The independent contributions of Varia" 
bles 3 and 4 are proportional to the coeffi- 
cients in Equations [1] and [3]. The contri 
bution of øa is small, particularly in the 
case of Sample A. Table 2 gives values % 
(n — ch) /o; for selected values of c4 and ^ 
(the mean of the item-test biserial corre!®” 
tions). It should be noted that linea? 
interpolation can be used within a colum? 
but not within a row of the table. From th® 
tabled values it is readily seen that a very 
homogeneous test (high item-test corre) 
tions) produces such a large standar 
deviation that a score range of six standa“ 
deviations is very unlikely unless there E 
an unusually wide spread in item difficulty’ 


Prediction of Test Reliability (Kud 
Richardson) 


r 
The close relation between the Kude 
Richardson test reliability and the me 


wee w " 
—————— 
——— 


TEST SCORES AND ITEM STATISTICS 29 


TABLE 2 
VALUES or (n — chance)/o; FOR SELECTED 
VALUES OF ca AND F 


F 
L^ 

20 | .25 |.30 | .35 | .40 | .45 | .50 

Sample A (Score — R) 
3.5.. 8.8| 7.6| 6.7| 6.2] 5.8 
3.0.. 8.4| 7.1| 6.3] 5.7| 5.3 
2.5.. 7.9| 6.6| 5.8| 5.2) 4.8 
rA 7.4| 6.2| 5.3| 4.8| 4.4 
L5. 6.9| 5.7| 4.9) 4.3| 3.9 
Le Oy 6.5] 5.2) 4.4) 3.8) 3.4 
0.5... 6.0} 4.8] 3.9] 3.4) 3.0 

ore = R — kW) 

3.5.. 16.4| 11.8! 9.4| 7.9] 6.9] 6.3] 5.8 
8:05. aus 16.0} 11.5| 9.0| 7.5] 6.6| 5.9| 5.4 
2.5.. 15.7! 11.1] 8.7| 7.2] 6.2] 5.6] 5.1 
2:0... 15.3} 10.8} 8.3] 6.8| 5.9| 5.2] 4.7 
Tibur. 15.0} 10.4| 8.0| 6.5] 5.5| 4.9| 4.4 
1:05 14.6| 10.1) 7.6| 6.1| 5.2| 4.5) 4.0 
0.5.. 14.8| 9.7| 7.3| 5.8] 4.8| 4.2 3.7 


item-test correlation is apparent from the 
data of Table 1. It happens that in the 
ease of Sample A the use of Variable 1 as 
well as Variable 4 reduces the Standard 
error of estimate by .007 from its value 
when Variable 4 is the sole predictor, 
Whereas Variable 3 adds virtually nothing 
to the precision of the prediction. For 
Sample B, on the other hand, the use of 
Variables 1, 3, and 4 yields a standard 
error of estimate only .0003 less than that 
associated with Variable 4 alone. The 


following equations are suggested for 
practical use: 


Sample A (Score = R) 


Z = —.6178 z, — 3683 24; [5] 
X/ = —.02358 X, —.006943 X, 
+ 1.0676; [6] 
Rean = 9761; SE... = 022. 


TABLE 3 
VALUES or reli FOR SELECTED VALUES 
OF (n — chance)/e; AND F AND FOR 
SELECTED VALUES or 7 ONLY 


3 
n—ch 


^* | 20 | 25 | .30 | .35 | 40 | 45 |.50 


Sample A (Score — R) 


14...| .56 

12...| .61 | .67 

hicri -72 | .75 

Sas -77 | .80 | .82 

[OM -83 | .85 | .86 | .87 
Bis. .85 | .87 | .88 | .89| .90 
bag § .89 | .91 | .92| .92 
4 .93 | .94| .95 


-55 | .72 | .81 | .86 | .89 | .92| .93 


Sample B (Score = R — kW) 
.91| .93 


-53 | .70 | .80 | .85 | .89 


TABLE 4 
VALUES OF 7 FOR SELECTED VALUES OF 
(n. — chance)/e: AND rel.ioo 


i= Tel.100 
* |.60 |.65 | .70 | 75 | .80 | .85 | .90 | .95 
Sample A (Score — R) 
14.. 21| .22| 
12.. 22| .23| .24 
102 -24| .25| .26 
8... 28| .30| .32 
7... .81| .34| .37 
6.. .83| .35| .39| .44 
5.. .88| .42| .48 
Sample B (Score = R — kW) 
14....| .22| .23 
12....] .23] .24| .25| .26 
10. .25| .26| .27| .29| 
Eu -30| .31| .33 
ee .83| .35| .38 
Bos .88| .41 
bie .45| .51 


30 FRANCES SWINEFORD 


In ease an estimate of the reliability 
is desired when only 7 is known, the fol- 
lowing formula is offered for Sample A: 


X» = —.01809 X, + 1.0065; [7] 
Rau = 9595; SE... = .029. 


Sample B (Score = R — kW) 
al = —.992977; [8] 


X/ = —.01895 X, + 1.0075; (9] 
Row) = .9297; SE. = .014. 


Values of rel.oo for selected values of 
(n — ch) /o, and 7 are given for Sample A 
and values of rel.oo for selected values of 7 
are given for both samples in Table 3. 


Estimation of the Mean Item-Test Correlation 


When item-analysis data are not avail- 
able, a reasonable estimate of the mean 
item-test correlation can be obtained from 
(n — chance) /o, and rel.1o if the latter was 
computed by an internal procedure, such 
as one involving split-half correlations. 
The formulas for estimating Variable 4 = 
1/7? from Variable 1 = (n — ch)/c, and 
Variable 2 = rel.1o follow: 

Sample A (Score — R) 


zs’ = 448 —.5280 zs ; [10] 
X; = .9003 X, —28.0025 X; 
--26.38; [11] 
Tua» = .9655; SE... = 1.415. 
Sample B (Score = R — kW) 
as! = 5427 2, —.4602 z, $ [12] 
Xi’ = .9916 X, —22.5720 X; 
+ 20.34; [13] 


Tua» = .9687; SE... = 0.461. 


These formulas have been used to 
compute the entries in Table 4, which 
gives values of 7 for selected values of 
(n — ch)/c, and rel.zoo . 


Summary 


Empirical methods have been employed 
to develop formulas for estimating the 
Score standard deviation, the test relia- 
bility, and the mean item-test correlation 
from related statisties. The variables were 
kept as simple as possible, and only linear 
relationships have been considered, even 
though theory dictates the use of variables 
that are less readily obtained and relation- 
ships that are nonlinear. Nevertheless, the 
results show that the estimates can be 
made with a high degree of accuracy 
within the limits represented by the tests 
used. Without further evidence, it would 
probably be unwise to extrapolate beyond 
the values given in Tables 2, 3, and 4. 


REFERENCES 


DRESSEL, P. L. Some remarks on the Kuder- 
Richardson reliability coefficient, Psy- 
chometrika, 1940, 5, 305-310. 

GuruixseN, H. The relation of item diffi- 
culty and inter-item correlation to test 
variance and reliability. Psychometrika, 
1945, 10, 79-91. 

Kuper, G. F., & Ricwarvson, M. W. The 
theory of the estimation of test relia- 
bility. Psychometrika, 1937, 2, 151-160. 

Swinerorp, F. Some relations between test 
scores and item statistics. Res. Bull. No. 
RB-57-2. Princeton, N. J.: Educational 
Testing Service, Feb., 1957. (Appendix) 


Received July 16, 1958. 


| 
| 
| 


je 


JOURNAL or EDUCATIONAL PscuoLoavY 
Vol. 50, No. 1, 1959 


THE COMPARATIVE PERS 


SUPERIOR AND I 


ONALITY ADJUSTMENT OF 
NFERIOR READERS 


RALPH D. NORMAN AND MARVIN F. DALEY 


University of New Mezico 


In the voluminous literature on reading, 
there is still a paucity of reports on the 
comparative personality adjustment of 
&ood and poor readers. This situation ex- 
ists despite the fact that such adjustment 
Is reputedly associated with reading suc- 
cess. Almost 20 years ago, for example, 
Gates (1941) stated that emotional in- 
Stability was found in about 7596 of re- 
tarded readers. Yet, only recently Louttit 
(1957, p. 169) contended that "there is 
need for further research on the positive 
relationship between reading achievement 
and personality adjustment." Many pre- 
vious studies have focused on the disabled 
reader himself without contrast with com- 
Parable controls; typically, these studies 
have used clinical or case-study methods. 
Such investigations are those by Vorhaus 
(1952), who examined reading disability 
cases with the Rorschach, and by others 
(Blanchard, 1928; Challman, 1939; Eph- 
Ton, 1953; Kunst & Sylvester, 1943; 
Staiger, 1957; Stauffer, 1947; Wheeler, 
1954) who utilized case-study approaches. 

Many studies which have concentrated 
91 comparison of good and poor readers 
Senerally have lacked certain precision. 
p lack has been due, in our opinion, to 

ae fact that only fairly lately have so- 
Dhisticated methods been introduced to 
deal With patterns or interaction effects in 
seultivariable tests, and to the fact that 
ce have often dealt with subjective 

.. uments, making it difficult to study 
differences in an exact manner? While we 
Mi Dot minimize subjective approaches, 

ore refined procedures hold out the pos- 


1 
let (recent exception is the study by Tabar- 
qun 8) who, however, while using an ob- 
s p personality test, did not treat for 
chometric patterns. 


31 


sibility of greater exactitude in dealing 
with patterns. Indeed, a number of earlier 
comparative studies (Gann, 1945; Jack- 
son, 1948; Spache, 1957) seem to support 
a hypothesis of personality patterning as 
related to reading disability. 

It is the main purpose of the present 
study, therefore, to contrast the person- 
ality adjustment of superior and inferior 
readers as measured by a multivariable 
objective test with the aim of uncovering 
any discriminating psychometric patterns 
of adjustment. A second purpose is to pro- 
vide a description of perceptions of them- 
selves and their environment by both 
groups via an item analysis of this test. 


Merxop 


For the present study, all Anglo (non- 
Spanish) white males with an IQ between 
84 and 116 on the Non-Language section 
of the California Test of Mental Maturity 
in the sixth grade in 14 middle class schools 
in Albuquerque, New Mexico, were seru- 
tinized? The restricted IQ range was 
chosen to cut out extremes in an attempt 
to control the factor of intelligence which 
correlates moderately with reading (Ste- 
phens, 1951). The Non-Language section 
was used since it is obviously less contami- 
nated by reading ability than the Language 
section. 

These pupils were examined for scores 
on the California Achievement Test to ob- 
tain two extremes in reading ability, i.e., 
those above or below one SD from their 


?'These schools were chosen from the 50 
in the city with consultation of guidance 
personnel who felt they were quite homo- 
geneous with regard to socioeconomic status. 
We wish to thank Stanley Caplan, Associate 
Director of Guidance, for his assistance. 


32 RALPH D. NORMAN AND MARVIN F. DALEY 


own grade level. The former were defined 
as superior readers (N — 42), all of whom 
had a reading grade placement score of 
7.6 or better, and a mean of 8.1. The latter 
were defined as inferior readers (V = 41), 
all of whom had a grade placement of 4.8 
or less, and a mean of 3.9. Thus, the aver- 
age difference in reading skill between the 
groups was 4.2 grades. The mean Non- 
Language IQ of the superior group was 
106.67; that of the inferior group was 
98.85. 

Examination of both groups on the basis 
of the criteria used (IQ and reading scores) 
produced the following results. An AV 
yielded an F between groups on reading 
grade placement of 1,267.24; df — 1, 81; 
P < 01. Another AV yielded an F (be- 
tween groups on IQ’s) of 18.10; df = 
1, 81; P < 91. Correlations between IQ's 
and reading score for each group separa- 
tely were not significantly different from 
zero. 

Since we are interested in personality 
adjustment and since it was determined 
that there was a significant difference be- 
tween the two groups in intelligence, the 
point may be raised concerning the pos- 
sible effects of the latter on personality 
adjustment, ie. that adjustment is as- 
sociated with intelligence as well as with 
reading. If this were so, analysis based 
upon personality measurements would re- 
flect results associated not with reading 
ability alone, but also with intelligence. 
However, two considerations regarding 
these three variables and their interrela- 
tionships should raise questions concerning 
the effects of intelligence. These are (a) 
that intelligence scores generally correlate 
poorly with personality scores, a point 
noted by Stephens (1951, p. 551) who 
states, “Typically, these investigations (on 
relationship between personality and in- 
telligence) come out with extremely low 
correlations, ranging from —0.40 to 0.20 
with an average close to zero,” and (b) 
that in our study, from a functional stand- 


point, a difference of eight IQ points (in 
the middle range) between our groups is 
certainly insufficient to account for the 
tremendous disparity of four grades in 
reading achievement. Further, on this 
point, are studies (Bouise, 1955; Gann, 
1945; Stewart, 1950) which matched Ss for 
intelligence and found adjustment differ- 
ences between differentially able readers, 
using subjective instruments. 

Both groups were administered the 
California Test of Personality (CTP), 
Elementary (Thorpe, Clark, & Tiegs, 
1953), which has 12 subtests and in which 
language difficulty has been kept at or 
below fifth grade level? (For profile pur- 
poses, the subtests have the advantages 
of high reliability and comparatively low 
intercorrelations.) Our general hypothesis 
is that the personality adjustment of su- 
perior readers will not differ significantly 
from that of the inferior; more partieu- 
larly is the hypothesis that with a multi- 
variable instrument (the CTP ) the pat- 
terns of adjustment of both groups will not 
differ significantly. To test these hypoth- 
eses, AV technique applied to testing group 
psychometric patterns proposed by Block, 
Levine, & McNemar (1951) was employed. 
To perform item analyses for our second 
purpose, tables of Edgerton and Paterson 
(1926) for testing significance of differ- 
ences between percentages were used. 


RESULTS 


Table 1 gives the means and SD's of the 
personality test scores of both groups. 


* The reader who is concerned by this fact 
when our inferior group had a mean reading 
grade of 3.9 is reminded that Ss were in- 
Structed to ask for help in understanding 
any items. More importantly, the CTP au- 
thors claim that in the test standardization 
group, there were no significant, differences 
between median scores of successive grade 
levels; if reading ability had a prominent 
effect on the CTP, one would expect grade 
changes in these average scores. Also, th@ 
test authors state that about 20% of thé 


standardization samples were retarded oD° 
half year or more. 


— 


ADJUSTMENT OF SUPERIOR AND INFERIOR READERS 33 


The AV reveals an F ratio of 42.11 ob- 
tained for Groups; df = 1, 81; P < 01; 
and an insignificant F of less than one for 
Groups X Variables. Thus, our first hy- 
pothesis that the personality adjustment of 
the superior readers does not differ sig- 
nificantly from that of the inferior readers 
must be rejected; however, the null hy- 
pothesis of no difference between patterns 
of adjustment of the two groups cannot be 
rejected. 

In line with the secondary purpose, it 
was found that 67 items of 144 on the 
test were significant at the 5% level or 
less in differentiating between the groups; 
‘ei in the indicated direction reveal a 
ower percentage for inferior readers in 64 
of the 67 items.‘ 


Discussion 


: litur we have found no difference in 
Patterns of adjustment between the two 
ut there is a definite difference in 
és pou. Moreover, Table 1 re- 
fee hat there is a relatively constant dif- 
iis of from five to 10 points between 
hs DS in the 12 test variables, the in- 
lor readers demonstrating consistently 
uu adjustment in all areas. We may 
"s Pee that there is no difference in 
2 ind of adjustment made by the two 
in ond considering kind to be reflected 
en erns; but there is a strong difference 
mis = of adjustment. Thus, our re- 
Sach e only partially in agreement with 
a TTS (1956, pp. 264-265) who states, 
Satie studies comparing the per- 
inas a of poor readers with good readers 
iffer ailed to reveal any consistent group 
ences. This is probably due to the 


Mistak 

Istaken attempt to find a common per- 

4 
An 84 $us dm 
tini Dew page table giving item analysis data 
Renin: eposited with the American Docu- 
as Institute, Order Document No. 
ect, PE m ADI Auxiliary Publication Proj- 
PA otoduplication Service, Library of 
f ss, Wash. 25, D. C. remitting $1.25 


or E s 

NH du. microfilm or $125 for 6 by 8 

Chier Tpcopies, Make checks payable to 
» Fhotoduplication Service. 


TABLE 1 
Means AND STANDARD DEVIATIONS or T 
Scores on CALIFORNIA Test or PER- 
SONALITY VARIABLES FoR BorH 


Groups 
Inferior Superior 
22 Readers Readers 
Variables 
Mean| SD |Mean| SD 


46.90| 6.77|52.38) 7.27 
50.49| 8.30/57.33| 7.36 


Self-Reliance 
Sense of Personal 


Worth 

Sense of Personal  |46.17| 7.57/53.71| 6.88 
Freedom 

Feeling of Belong- |47.41| 8.81/56.50| 6.79 
ing 


Withdrawing Ten- |51.32) 7.84/59.38| 6.96 
dencies (Freedom 
from) 

Nervous Symptoms |49.44/11.43/55.17| 7.61 
(Freedom from) 

Social Standards — |43.95| 8.98/53.83| 7.36 

Social Skills 49.17, 9.84\54.36/10.19 

Anti-Social Tenden-44.80/10.21/52.14| 9.44 
cies (Freedom 


from) 
Family Relations — 45.12, 8.47|52.52) 6.83 


School Relations 50.05| 9.79/57.71| 7.14 
Community Rela- 43.07| 8.72/50.43| 5.22 


tions 


sonality type or problem in reading dis- 
ability cases." 

Although we recognize that some seven 
of 144 item comparisons should be sig- 
nifieant by chance alone at the 596 level, 
there still remain 60 items reflecting real 
differences. It is interesting to note ten- 
dencies for these significant items to cluster 
together in a manner different from that 
based on the rationale of the 12 test 
variables, since they cut across them. In 
the discussion following, we have adopted 
a sort of “need-press” scheme of Murray 
(1938), selecting items most illustrative of 
these tendencies. A few items appear in 
more than one of these clinical clusters; 
six of the 67 do not appear, three of which 
(66, 67, 68) indicate inferior readers with 
significantly higher percentages of affirma- 
tive answers to questions on physical 


34 RALPH D. NORMAN AND MARVIN F. DALEY 


symptoms, and three of which (94, 98, 
140) are reversals. 

Five clusters seem to describe the in- 
ferior reader's “presses” or perceptions of 
his external environment. Poor Family 
Interaction is suggested by one grouping 
which goes well beyond the test category of 
Family Relations because most of the 
items cited here fall outside that category. 
Two examples are: 


27. May you usually bring your friends 
home when you want to? (66; 24)* 

114. Do you like both of your parents 
about the same? (100; 73) 


Ten other significant differences (Items 15, 
21, 31, 34, 41, 46, 78, 81, 115, 119) show 
unstable home environment, family dis- 
cord, conflict with and about parents or 
relatives, and authoritarian discipline. 

A second cluster is Rejection by Others 
wherein items reflect feelings of being 
scorned, rebuffed or excluded from activi- 
ties by peers and others. Two examples 
from this group are: 


23. Do people often think that you can- 
not do things very well? (24; 61) 

55. Are people often so unkind or unfair 
that it makes you feel bad? (19; 49) 

Other items in this cluster are 14, 16, 43, 
48, 53, 94, 121, 130, and 132. 

A third grouping involves Frustration- 
Aggression by Others; here, maltreatment, 
quarreling, etc., instigated by others char- 
acterize the common element of the fol- 
lowing examples and of other items (55, 
60, 101, 103, 115, 129): 


107. Do classmates often quarrel with 
you? (7; 37) 

126. Does it seem to you that some of the 
teachers "have it in for" pupils? (12; 32) 


Fourthly, a cluster of items centers in 
Conflicts About Other-Dominance, wherein 
exist perceptions of prohibitions, re- 

* Numbers in parentheses after items refer 
to respective percentages of superior and in- 


ferior readers who answer the item affirma- 
tively. 


straints, coming out second-best, etc. Ex- 
amples below and Items 26, 31, 34, 78, 81, 
85, and 88 are significant: 


10. Do your parents or teachers usually 
need to tell you to do your work? (12; 49) 

30. Are you prevented from doing most of. 
the things you want to do? (9; 37) 


A final group of “presses” depicts En- 
vironmental Deprivation, suggesting defi- 
ciencies of companionship, friendship, and 
interesting "things" and “places”; these 
items, besides the two below, are 32, 43, 58, 
59, and 135: 


45. Do you have just a few friends? (12; 
37) 


134. Do you think there are too few in- 
teresting places near your home? (24; 54) 


Four categories refer more to internally 
expressed “needs” of the retarded reader. 
Aggression Towards Others is suggested in 
the following and other items (78, 82, 89, 
99, 103) ; the common element appears to 
be overt or covert opposition to others, 
taking the form of resistance or more di- 
rect action: 


74. Is it all right to disobey teachers if 
you think they are not fair to you? (0; 20) 

101. Do people often act so mean that 
you have to be nasty to them? (14; 37) 


Need for Impulsivity in Action wherein 
response is in accordance with impulse re- 
gardless of others, irresponsibility, and de- 
fiance of convention seems demonstrated 
by Items 1, 74, 77, 81, 82, 86, and 142 and 
these two examples: 


73. Is it all right to cheat in a game whe? 
the umpire is not looking? (0; 15) 
75. Should one return things to people 


m won't return things they borrow? (95 


Rejection of Others appears to be tHe 
need common to the two examples below 
and to Items 77, 117, 125, and 143. THS 
need is characterized by poor positiv? 
identifications, and abandonment or 7€ 
maining indifferent to others: 


———— 9 
ee 


qc re em, 


v 


ADJUSTMENT OF SUPERIOR AND INFERIOR READERS 35 


"n Are you proud of your school? (100; 


141. Do you dislike many of the people 
who live near your home? (12; 32) 
. Finally, there is a cluster which centers 
in Inferiority Feelings, encompassing eight 
items (12, 20, 24, 39, 52, 86, 88, 124). Ex- 
amples: 


24, Do most of your friends and class- 
mates think you are bright? (71; 39) 
88. Does it make you feel angry when 
you lose in games at parties? (2; 15) 
The Signifieant items showing "rever- 
sals” perhaps deserve a word of comment. 
hese are: 


94. Do the boys and girls seem to think 
you are nice to them? (51; 83) 
Have unfair people often said that 
you made trouble for them? (50; 29) 
z 40. Do you help children keep away 
im places where they might get sick? (29; 


We feel that, despite the retarded read- 
CTS strong feelings of rejecting and being 
ra there crops out in these items a 
onging to be thought well of by others, 
o oiliny peers. Item 140, in fact, has 
: Ie Second highest CR of all 67 significant 
items, 
ne results are strikingly similar to 

Ose found by Tabarlet (1958) who com- 
Pared 43 fifth graders, retarded two or 
ed Years in reading, with 29 average 

“ers, all within average intelligence. 
er a sociometrie device, Tabarlet found 
clase Average readers chose more of their 
owe. and were chosen by the latter 

i oe than were retarded readers; 
ante — were statistically signifi- 
others ur clusters of rejection of and by 
closely among inferior readers correspond 
diac With this finding. Moreover, on an 

ive personality test other than the 

ing po obe finds average readers scor- 
in cats cantly higher than the retarded 
ipa of behavioral maturity, in- 
istying Es Skills, social participation, sat- 
outlook ork and recreation, and adequate 
and goals. Apparently, these out- 


comes are much like ours, even though our 
controls were superior rather than average 
readers. 

Some comment is needed here about the 
possible effect of socioeconomic status on 
CTP results, especially since we selected 
our “middle class” schools on impression 
(see Footnote 2) and since Tabarlet re- 
ports that there appeared to be a tendency 
for more retarded readers to come from 
schools in lower socioeconomic areas. 
Some seven significant items (15, 32, 59, 
117, 135, 142, and 134, cited above) might 
possibly support such a notion. On the 
other hand, there are eight items (33, 76, 
79, 106, 110, 136, 139, 144), five of which 
deal directly with economic distress, which 
yield no significant difřerences between our 
two groups. It is indeed difficult to tell 
from the test itself whether the answer to 
items such as 134 springs from fact or the 
misperceptions of the retarded reader. We 
are prone to favor the latter viewpoint. 

Design of the present study, of course, 
does not permit us to state whether per- 
sonality problems are a cause or result of 
poor reading ability. Louttit (1957) cites 
numbers of studies whose authors make 
such a statement; our data simply confirm 
that the two factors are associated. 


SUMMARY 


A group of 42 superior was compared 
with a group of 41 inferior readers; both 
groups were composed of sixth grade boys 
from middle class schools and had com- 
parable mean IQ's. Mean reading achieve- 
ment difference between groups was 4.2 
grades. They were given the California 
Test of Personality in order to ascertain 
differential patterns of personality adjust- 
ment. Analysis of variance revealed no 
difference in pattern, but superior readers 
achieved significantly higher adjustment 
scores on all parts of the test. Inspection 
of 67 significant items suggested several 
clusters of “needs” and “presses” which dif- 
ferentiated between the two groups. 


36 RALPH D. NORMAN AND MARVIN F. DALEY 


REFERENCES 


Braxcuanp, Payus. Reading disabilities in 
relation to maladjustment. Ment. Hyg., 
1928, 12, 772-788. 

Brock, J., Levine, L. & McNemar, Q. Test- 
ing for the existence of psychometric 
patterns. J. abnorm. Soc. Psychol., 1951, 
46, 356-359. . 

Bourse, L. M. Emotional and personality 
problems of a group of retarded readers. 
Elem. Eng., 1955, 32, 544-548. 

CHALLMAN, C. C. Personality adjustment and 
remedial reading. J. ezcep. Child., 1939, 
6, 7-12. 

Encerton, H. A. & Paterson, D. G. Table 
of standard errors and probable errors 
in percentages for varying numbers of 
cases, J. appl. Psychol., 1926, 10, 378-391. 

Eruron, Bevan. Emotional difficulties in 
reading. New York: Julian Press, 1953. 

Gann, Epiru. Reading difficulty and per- 
sonality organization. New York: King’s 
Crown Press, Columbia Univer., 1945. 

Gares, A. I. The role of personality malad- 
justment in reading disability. J. genet. 
Psychol., 1941, 59, 77-83. 

Harris, A. J. How to increase reading ability. 
New York: Longmans Green, 1956. 
Jackson, J. A survey of psychological, social, 
and environmental differences between 
advanced and retarded readers. J. genet. 
Psychol., 1948, 65, 113-131. 

Kunst, Many, & Syivester, Emmy. Psycho- 


dynamic aspects of the reading problem. 
Am. J. Orthopsychiat., 1943, 13, 69-76. 

Lourrir, C. M. Clinical psychology of ez- 
ceptional children. New York: Harper, 
1957. 

Morray, H. A., er av. Explorations in per- 
sonality. New York: Oxford Univer. 
Press, 1938. 

Spacue, G. D. Personality patterns of re- 
tarded readers. J. educ. Res., 1957, 50, 
461-469. 

Srarcer, R. C. Self-responsibility and read- 
ing. Educ., 1957, 77, 561-565. 

Sraurrer, R. G. Clinical approach to per- 
sonality and the disabled reader. Educ. 
1947, 67, 427-435. 

SrEPHENS, J. M. Educational psychology. 
New York: Holt, 1951. 

SrEwanr, R. S. Personality maladjustment 
and reading achievement. Am. J. Or- 
thopsychiat., 1950, 20, 410-417. 

Tasartet, B. E. A study of the mental health 
status of retarded readers. Unpublished 
doctoral dissertation, Louisiana State 
Univer., 1958. 

Tzonrz, L. P., Cuarx, W. W., & Tieas, E. W. 
California test of personality. Los An- 
geles: California Test Bureau, 1953. 

Vornavs, P. G. Rorschach configurations as- 
sociated with reading disability. J. proj. 
Tech., 1952, 16, 3-19. 

WueeLer, L. R. Dealing with emotional 


problems in the classroom. Educ., 1954, 
14. 566-571. 


k^ 


JOURNAL or EDUCATIONAL PsvcHoLoGY 
Vol. 50, No. 1, 1959 


CONCURRENT VALIDITY OF THE “WARM TEACHER SCALES” 


M. STEPHEN SHELDON, JACK M. COALE, AND ROCKNE COPPLE 
Colorado State College 


In recent years a number of new psy- 
chological inventories have been published, 
and scales on existing inventories de- 
veloped that purport to be helpful in 
selecting teachers and candidates to be- 
come teachers (Cook, Leeds, & Callis, 
1952; Hathaway & McKinley, 1943; 
Ryans, in press), In most cases empirical 
validity coefficients have been presented 
for these instruments. The criteria used 
for the validation were, in general, ratings 
made by Supervisors, administrators, pu- 
pils, or trained observers. Such criteria 
may in themselves be questionable. Never- 
theless the Scales seem to be getting at 
cie existent personality configuration 

at is rationally related to what a good 
teacher ought to be. 

a are two indications that these 
Zt. selection scales are each meauring 
d the same thing. First, the authors 
: Several of the scales use almost synon- 
i adjectives in describing the attri- 
putes that are being measured, eg, 
nis "friendly," "rapport building." 
ditor though the different scales use 
dion types of items, (Likert, true-false, 
sal a choice), and the items themselves 
nee ith different kinds of things, there is 
EA P'lvely high correlation between the 
ales 
If the assumptions are true that: Good 


te; 
"chers possess a particular personality 
a $ . 
ing watiorie Rapp, Selection and Counsel- 
alif ace School of Education, University 
relations hy at Los Angeles, computed cor- 
Selection tween the four MMPI teacher 
1 scales used in this study. The 
o? these correlations was of the 
(1954) 54 50. Further, Cook and Medley 
intention jomparing the Ho and Pv Scales, 
ally Selected items which discrimi- 
Seb, etween individuals having high and 
res on the MTAT. 


37 


structure and that many of the “warm 
teacher scales" are in some ways measur- 
ing this structure, then it should also 
hold true that a group of individuals at- 
taining high scores on a number of these 
seales ought to look quite different on 
certain other psychological measures than 
a group attaining low scores on the same 
seales. It is the purpose of this paper to 
test that hypothesis and determine on 
which of several personality measures the 
high and low scores on the “warm teacher 
scales" differ. 


Procedures 


The initial sample of Ss used in this 
study was comprised of 176 college fresh- 
men enrolled in general psychology at 
Colorado State College. (The students 
were actually enrolled in five different 
sections of the course.) Each took the 
Minnesota Teacher Attitude Inventory 
(MTAI) (Cook et al, 1952) and re- 
sponded to the items from the Minne- 
sota Multiphasie Personality Inventory 
(MMPI) (Cook & Medley, 1954; Gowan 
& Gowan, 1955; Meehl & Hathaway, 
1946) that are included on the K, Hostil- 
ity (Ho), Pharisaic Virtue (Pv), and 
Teacher Prognosis (Tp), scales. Standard 
scores (T) were computed on each of the 
five scales and summed for each S in the 
sample. The 10 highest and 10 lowest 
cumulative scores were identified and all 
agreed to take part in further testing. 

It was specifically hypothesized that the 
following measures would differentiate be- 
tween the high and low scoring groups 
(henceforth identified as Group A and 
Group B, respectively): (a) the Study of 
Values (Allport, Vernon, &  Lindzey, 
1952), (b) six needs on the Edwards Per- 


38 M. S. SHELDON, J. M. COALE, AND R. COPPLE 


sonal Preference Schedule (EPPS) (Ed- 
wards, 1954), (c), the California F Scale 
(Ardono, Frankel-Brunswik, Levinson, & 
Sanford, 1950), (d) the Wechsler Adult 
Intelligence Scale (WAIS) (Wechsler, 
1955), and (e) 10 cards from the The- 
matic Apperception Test (TAT) (Mur- 
ray, 1943), scored in the manner suggested 
by Friedman (1957). These tests were 
administered to Ss in both Groups A and 
B. 

For scoring the TAT a 60-item Q sort 
was devised (Stephenson, 1953). The 60- 
item sort included 10 items descriptive of 
each of six needs specified by Murray 
(1938), measured by the EPPS (Edwards, 
1954), and rationally considered relevant 
in this study to differentiate between the 
A and B groups. Specifically measured 
were the needs for affiliation, nurturance, 
aggression, dominance, succorance, and 
abasement. The association of each item 
with the corresponding need was inde- 
pendently checked and agreed upon by a 
jury of four professional psychologists. 

After each S had related his story in 
response to a TAT card the examiner 
asked that the S describe the hero of his 
story by throwing the Q sort. The forced 
distribution for this sort was a nine-point 
Scale. This procedure was followed with 
Ss from Group A and B for each of 10 
cards, 

Since the Q sort was to be used by each 
S to deseribe 10 different TAT heroes, 
each item was a descriptive statement 
having no subject and a third person verb, 
e.g. “enjoys sympathy" (succorance), “is 
affectionate” (nurturance), “feels quilty 
when things go wrong” (abasement). 
Many of the items were direct quotations 
from the EPPS manual (Edwards, 1954). 

The scores assigned by the S to the 

item associated with each need were 
summed for each sorting. This resulted in 
a score for each need for every TAT hero 
for all Ss. The total scores for each need 
were then averaged for individual Ss. 


The significance of the difference be- 
tween the means of Group A and Group 
B on each of the psychological measures 
was tested by computing ¢ ratios. 


Results 


The results of the £ test appear in Table 
k: 

The results in Table 1 show that the 
mean of Group A was significantly higher 
on the WAIS (more intelligent) and sig- 
nificantly lower on the California F Scale 
(less authoritarian). 

No significant difference was found be- 
tween Group A and Group B on any of 
the scales of the Study of Values. 

As expected, on both the EPPS and the 
TAT, Group A demonstrated a signifi- 
cantly higher need for affiliation and 8 
significantly lower need for succorance 
than did Group B. The significantly lower 
need for abasement demonstrated by 
Group A on the EPPS was not sub- 
stantiated by the TAT measure of this 
same need. 

Certain unexpected results occurred. On 
the TAT, Group A demonstrated a sig- 
nificantly lower need for dominance than 
did Group B and a significantly higher 
need for aggression than did Group B. On 
the EPPS these findings were reversed, 
Group A scoring a significantly higher 
need for dominance than did Group P 


and a significantly lower need for aggres- 
sion. 


Discussion 


It appeared to the experimenters that 
there was some credence to be allowe 
their original hypothesis, ie, groups © 
freshmen students at the institution where 
the experiment was conducted, when dif- 
ferentiated on the basis of the MTAL 
and certain of the MMPI scales, wer? 
found to differ significantly on 11 of the 
20 measures experimentally administered: 

The direction of the significant dif 
ferences found in this study will not b* 


M 


VALIDITY OF “WAR 


M TEACHER SCALES" 39 


TABLE 1 


TABLE or Mrans, MEAN DIFFERENCE 


AND ¢ RATIOS FOR GROUP A AND GROUP 


B Scores ror Each MEASURED VARIABLE 


Test Scale Mean A Mean B Dif t 
WAIS 117 101 16 2.64* 
Calif. F Scale 39.55 60.33 20.78 3.74** 
Study of Values Theoretical 39.00 40.50 1.50 .70 
Economic 39.78 39.00 .78 .48 
Aesthetic 37.07 40.00 2.33 1.23 
Social 37.00 36.00 1.00 ES 
Political 40.89 40.17 .12 ll 
Religious 45.89 43.17 2.72 1.14 
TAT Affiliation 56.78 53.17 3.61 2.31* 
Nurturance 56.22 55.33 .89 .18 
Aggression 51.33 46.17 5.16 ARES 
Dominance 40.82 47.67 6.45 2.41* 
Succorance 46.11 53.67 7.56 3.01** 
Abasement 42.44 46.00 3.56 1.00 
EPPS Affiliation 18.56 15.83 2.73 2.16* 
Nurturance 14.00 16.17 2.17 1.44 
Aggression 8.33 11.50 3.17 8.21** 
Dominance 14.78 8.83 5.95 8.27** 
Succorance 9.56 14.33 4.77 2.74* 
Abasement 9.33 21.00 11.67 5.64** 


* gi: 
s Significant at the 5% level of confidence. 
Significant at the 1% level of confidence. 


aoe to those familiar with public 
HE ien ion. It would be expected that 
Eum Y teachers and teacher candidates 
E compared to unfriendly ones would 
ins intelligent, less authoritarian, 
Hou € higher need for affiliation, and a 
abase heed for aggression, succorance, and 
sement, 

eei Predictions could probably be 
of Val 9r some of the scales on the Study 
group oP Certainly we would expect a 
acta n ; warm teachers to have a higher 
diis ue than à group not so described. 
identified v significant differences were 
of Valus z this study for any of Study 
that, in S Scales. It is altogether possible 
Values E of the rational expectancy, 
etiwee © measured will not differentiate 
n friendly and unfriendly teachers. 


It is more probable that a major portion 
of the variance of the Study of Values 
Scales is accounted for by the values of the 
subcultures from which a sample of Ss is 
drawn. In the present study the: sample 
seemed to be homogeneous in their sub- 
cultural backgrounds, ie. rural mid- 
westerners. 

The reversals on the needs for domi- 
nance and aggression as measured by the 
EPPS and the TAT were not predicted 
and are difficult to account for. Several ex 
post facto explanations were explored in 
an attempt to cast light on these unique 
results, However, none was consistent with 
the other findings of the present study. 

Several implications may be drawn from 
these data: (a) the so-called “warm 
teacher scales” seem to be measuring à 


40 M. S. SHELDON, J. M. COALE, AND R. COPPLE 


personality configuration which can be 
further analyzed by other psychological 
techniques, (b) a combination of such 
measures and scales may improve the pre- 
dietion of those who select candidates for 
teacher training, (c) the positive results 
of this investigation warrant further study 
of the measurable personality character- 
isties of teachers designated as: "warm," 
"friendly," and "rapport building." 

In regard to this latter point it is sug- 
gested that the above experimental con- 
ditions be repeated: (a) using a larger N 
and (b) with groups of student teachers 
and with teachers who have had some 
successful teaching experience. 


Summary 


A number of psychological scales, em- 
pirically developed, have been considered 
helpful in selecting teachers, The adjectives 
used for teachers who looked good on 
these scales are “warm,” “friendly,” "rap- 
port building," ete. The present paper is 
concerned with how individuals, who score 
high and low on these scales, differ with 
respect to other psychological measures. 
The 10 persons scoring highest and the 
10 lowest on the MTAI and four scales 
of the MMPI (K, Ho, Py, and Tp) from 
a sample of 176 students at Colorado 
State College were given: (a) the Study 
of Values, (b) the EPPS, (c) the Cali- 
fornia F Scale, (d) the WAIS, and (e) 
the TAT. 

The high and low groups were found 
to differ significantly in intelligence, au- 
thoritarianism, and certain manifest and 
latent needs. 


REFERENCES 


AnonNo, T. W., Franxer-Brunswick, ELSE, 
Levinson, D. J. & Sawronp, R. N. The 
authoritarian personality. New York: 
Harper, 1950. 

ALLPORT, G. W., Vernon, P. E., & LINDZEY, 
G. Study of values. New York: Psychol. 
Corp., 1952. 

Coox, W. W., Leros, C. H., & Carus, R. 
Minnesota teacher attitude inventory. 
New York: Psychol. Corp., 1952. 

Coox, W. W., & Meery, D. M. Proposed 
hostility and pharisaic-value scales for 
the MMPI. J. appl. Psychol., 1954, 38, 
414-418. 

Epwarps, A. L. Edwards Personal Prefer- 
ence Schedule. New York: Psychol. 
Corp., 1954. 

Friepman, I. Objectifying the subjective—A 
methodological approach to the TAT. 
J. proj. Tech. 1957, 23, 243-247. 

Gowan, J. C, & Gowan, M. S. A teacher 
prognosis scale for the MMPI. J. educ. 
Res. 1955, 49, 1-12. 

Haraway, S. R, & McKixutey, J. C. 
Minnesota multiphasic personality in- 
ventory. New York: Psychol. Corps 
1943. 

Mzzur, P. E, & Haruaway, S. R. The K- 
factor as a Suppressor variable in the 
POMPI J. appl. Psychol. 1946, 30, 525- 

64. 

Murray, H. A. ET AL. Explorations in per- 
sonality. New York, London: Oxford 
Univ. Press, 1938. 

Murray, H. A. ET AL. Thematic appercep- 
tion test. Cambridge: Harvard Univer- 
Printing Office, 1943. 

Ryans, D. G. Report on the teacher charac- 
teristics study. Wash. D. C.: Amer. 
Council on Educ., in press. 

SrEPHENSON, W. The study of behavior. 
Chicago: Univer, Chicago Press, 1953. 

Wecuster, D. Wechsler Adult Intelligence 
Test. New York: Psychol, Corp., 1955. 


Received August 22, 1968. 


JounNAL or EpvcarioNAL PsyéuoLoa 
Vol. 80, No. 1, 1959 


AN EVALUATION OF THE EFFECTIVENESS OF A 
FRESHMAN MATHEMATICS COURSE: 


J. STANLEY AHMANN AND MARVIN D. GLOCK 


Cornell University 


The purpose of this study is to evaluate 
the effectiveness of a one-semester experi- 
mental course in applied mathematics de- 
signed for college freshmen. The Ss used 
Fw. freshmen enrolled in the College of 
us RUM at Cornell University during 
ia bera 1952 through 1955. The course 
tien uated in terms of their subsequent 
"-— - as represented by grade-point 
mee Bes, i nal marks in courses involving 

1ematies, scores on a mathematics test, 


and tendency to remain enrolled at Cornell 
niversity, 


METHOD 


uite, the effectiveness of the ex- 
WE me al course in applied mathematics 
tal and uated by comparing an experimen- 
aad control group in terms of (a) sub- 
ron achievement at Cornell University 
achiever at SMERTAD, (2) absent 
group ment at Cornell University (each 
terme oun into two subgroups in 
c) tain igh school grade-point average), 
Desired in knowledge of mathematics as 
est, and by the Cornell Mathematics 
at Com " (d) tendency to remain enrolled 
Reine mx University. Although both four- 
in the e Rosen students were involved 
ents valuation, only the four-year stu- 
are included in this report. 


i Uu n 
tive aper project has been a coopera- 
butions h n addition to the authors, contri- 
bers o; "is been made by many other mem- 

esting a, d Staff of the Cornell University 
195g. nd Service Bureau between 1947 and 
Staff, 49 "Eh not a member of the Bureau 
Was a very Geiselmann of Cornell University 
and late p active participant in the middle 
tora] thea of the project and wrote a doc- 
MES are į 1S concerning it. Some of his find- 
neluded in the following tables. 


41 


Cornell Mathematics Test 


The 1952 edition of the Cornell Mathe- 
maties Test, hereafter referred to as the 
CMT, is essentially a power test which 
measures achievement in arithmetic and 
basic algebra. It is composed of 55 multiple- 
choice test items and requires two and one- 
half hours of testing time. 

The degree to which this test was valid 
was established in several different ways. 
First, an attempt was made to guarantee a 
high degree of content validity by following 
a carefully designed plan of test construc- 
tion. The initial,step was to construct sam- 
ple test items covering most of the common 
arithmetic and algebraic operations. These 
were then sent in the form of a question- 
naire to 101 College of Agriculture faculty 
members and to instructors in the physical 
sciences who taught many agriculture stu- 
dents. They edited the items and added 
others. Each item was then rated in terms 
of the importance, insofar as their courses 
were concerned, of the mathematical oper- 
ations needed to solve it. These results were 
used to construct the 1951 edition of the 
test. Administrations of this edition pro- 
vided additional data about item difficulty 
and item discriminating power; this in- 
formation was used to develop the 1952 
edition. The correlation between this test 
and the Cooperative Mathematics Pre-Test 
for College Students was 0.71 (Geiselmann, 
1955, pp. 67-68). 

Additional unpublished studies to estab- 
lish the degree of validity of the CMT con- 
cerned concurrent validity. Students who 
had completed advanced algebra in high 
school significantly surpassed those who 
had completed only intermediate algebra 
in terms of this test. Also, those who had 


42 J. S. AHMANN AND M. D. GLOCK 


completed intermediate algebra signifi- 
cantly surpassed those who had completed 
only elementary algebra. In addition, the 
means of the mathematies test scores of 
students who thought their high school 
mathematies preparation was good, sur- 
passed the means of those who thought the 
opposite. Finally, seniors significantly sur- 
passed freshmen in terms of the CMT after 
individual differences in scholastic aptitude 
and prior academic achievement had been 
controlled. The last comparison included 
an analysis of covariance in which the Ohio 
State Psychological Examination scores 
and the Cooperative General Science Test 
scores were used as controls. 

Several reliability estimates were made. 
The most recent computation using the 
Kuder-Richardson Formula #20 yielded 


an r value of 0.89 (Geiselmann, 1955, p. 
66). 


Experimental Course in Applied Mathe- 
matics 


The content of the applied mathematics 
course was selected primarily on the basis 
of the item-by-item analysis of the student 
responses to the CMT, suggestions from 
members of the College of Agriculture fac- 
ulty, and analyses of the textbooks used by 
the students in agriculture and physical 
science courses. The principal topics cov- 
ered were computation with a slide rule, 
fundamentals of algebra, equations, sig- 
nificant figures and approximation, ratio 
and proportion, trigonometry of the right 
triangle, graphs, and logarithms (Geisel- 
mann, 1956). All of these topics are typi- 
cally a part of junior high school arithmetic 
and elementary algebra programs. 

The course was taught three hours per 
week for 15 weeks during the fall term. 
Each term there were six sections of about 
25 students per section. The same instruc- 
tor taught all sections all terms. 


Selection of Subjects 


The top 20% of each incoming freshman 
class in terms of mathematics skill as meas- 


ured by the CMT was excluded from the 
study. Because of class scheduling prob- 
lems, this group had to be identified before 
this test could be administered. In antici- 
pation of this, earlier samples of freshmen 
were tested, and a regression equation was 
computed in which the number of high 
school mathematics courses, the high school 
grade-point average in mathematics, and 
the high school grade-point average, not 
including mathematics, were used to pre- 
dict CMT-scores. To obtain the samples 
used in this present study, predictions were 
made for each entering freshman and he 
became a part of the sample or was re- 
jected according to the size of his predicted 
CM T-score. Also excluded from the study, 
irrespective of the size of their CM T-scores, 
were all women students, transfer students, 
foreign students, and part-time students. 
Half of the freshmen in the eligible group 
were enrolled in the applied mathematics 
course and served as the experimental 
group; every second student was placed in 
the control group which did not receive any 
formal instruction in mathematics. 


RESULTS 


Subsequent Achievement (Groups not 
Stratified) 


The experimental group was compared 
with the control group in terms of six cu- 
mulative grade-point averages and final 
marks in eight introductory courses involv- 
ing mathematics. In each instance an analy- 
sis of covariance (Wert, Neidt, & Ahmann 
1954, pp. 343-363) was computed. Two 
control variables were used, viz., the Ohio 
State University Psychological Examina- 
tion and the CMT. The results for a three- 
year period are summarized in Table 1. 

In only two instances were the differences 
between the means significant. Both i” 
volved final marks in chemistry courses. In 
one case the difference between the me 
favored the experimental group; in tb? 
second case it favored the control group” 
In summary, the influence of the expe? 


D 


EVALUATION OF A MATHEMATICS COURSE 43 


TABLE 1 


COMPARISON OF EXPERIMENTAL AND CONTROL GROUPS BY MEANS or ANALYSES 
or COVARIANCE IN TERMS oF SUBSEQUENT ACHIEVEMENT 


. N Mean 
Criterion F value 
Exp. Con. Exp. Con. 
co im averages at end of: 
ne semester 271 255 72.0 71:4. 2.47 
Two Semesters 246 | 2922 | 72.5 73.4 1.39 
F ree Semesters 112 105 74.6 75.4 1.13 
poo Semesters 99 90 74.9 75.8 1.15 
pa Semesters 51 36 75.5 76.3 0.64 
Six Semester: 
Final mete m 50 34 75.9 76.9 0.99 
Gretistey 105 119 121 64.0 61.2 3.98* 
Ray 106 83 74 67.2 67.7 0.14 
Rey 101 152 133 68.1 70.9 2.91 
i emistry 102 116 111 68.3 TLT 6.12* 
PEL Engr. 1 52 89 76.2 74.7 2.72 
te 103 55 36 67.9 66.9 0.44 
ysics 104 43 27 68.3 69.2 0.04 
athematics 161 9 18 65.0 68.1 0.01 


" Significant at .05 level. 


n course could not be detected to any 
Portant degree in subsequent Cornell 


Achievement M Wi 
when the t we 
stratified. o groups were not 


Subsequent Achievement (Groups Strati- 
fied) 

pus the experimental and control 
E ps Were stratified into two subgroups 
" id of high school grade-point average. 

of 85 S s group in each case had averages 
ages of bering the lower group had aver- 
groups elow 85. The experimental sub- 
ine "eT compared to their correspond- 
Ae n subgroups in terms of two 
marks į grade-point averages and final 
chemistr, Seven introductory courses in 
each inet engineering, and physics. In 
computed ue an analysis of covariance was 
ables me, using the same two control vari- 
tace ntioned earlier, The results for a 
2. ar period are summarized in Table 


A H 
Ws rs Only two significant differences 
Sroup a nd, one favoring the experimental 
nd one the control group. In the 


first case the criterion was final marks in 
chemistry and in the second case final 
marks in agricultural engineering. As in the 
case of Table 1 the results shown in Table 
2 reveal little or no improvement in subse- 
quent achievement which can be attributed 
to the experimental course in applied math- 
ematics. 


Gain in Mathematical Knowledge 


The CMT was administered to both ex- 
perimental and control groups as a pretest 
in late September and again as a final test 
in early May, over seven months later. 
Each student’s gain was computed by find- 
ing the difference between his final test 
score and his pretest score. By means of an 
analysis of variance it was found that the 
mean gain of the experimental group (5.5) 
significantly surpassed the mean gain of the 
control group (2.2). 


Tendency to Remain in College 


The experimental and control groups 
were compared in terms of the tendency of 


44 J. S. AHMANN AND M. D. GLOCK 


TABLE 2 
Comparison OF STRATIFIED SuBGROUPS OF EXPERIMENTAL AND CONTROL 
GROUPS BY MEANS OF ANALYSES OF COVARIANCE IN TERMS OF 
SUBSEQUENT ACHIEVEMENT 


N Mean 
Criterion Subgroup F value 
Exp. | Con Exp. Con. 
Grade-point averages at end of: 
One Semester Upper 129 143 73.8 73.4 0.89 
Lower 142 103 70.4 69.7 1.01 
Two Semesters Upper 121 130 | 74.2 | 74.6 | 0.12 
Lower 125 89 70.9 71.9 0.77 
Final Marks in: 
Chemistry 105 Upper 71 71 65.8 | 64.1 1.09 
Lower 40 40 | 60.1 | 57.0 | 1.88 
Chemistry 106 Upper 52 48 | 68.3 | 70.3 | 1.08 
Lower 26 24 65.4 62.6 1.42 
Chemistry 101 Upper 60 66 69.7 73.2 1.95 
, Lower 92 63 | 67.0 | 69.1 | 1.22 
Chemistry 102 Upper 49 62 70.7 72.4 0.16 
Lower 67 | 49 | 66.3 | 71.1 | 7.88" 
Agr. Engr. 1 Upper 42 | 74 | 80.6 | 77.4 | 4.49* 
. Lower 48 | 44 | 73.8 | 72.8 | 0.54 
Physics 103 Upper 29 25 | 69.7 | 69.6 | 1.16 
; Lower 26 | 11 | 65.9 | 60.5 | 2.81 
Physies 104 Upper 23 20 69.7 70.6 1.44 
Lower 20 7 66.6 64.9 0.17 


* Significant at .05 level. 


their members to remain enrolled in Cor- 
nell University. These comparisons were 
made at the end of the semester in which 
the applied mathematics course was taught, 
at the end of the first semester following, 
and again at the end of the second semester 
following. Chi-square analyses were com- 
puted for each semester individually and 
for the semesters eumulatively. A dropout 
was defined as a student who left the Uni- 
versity with a cumulative grade-point aver- 
age of less than 70. 

The results for a three-year period are 
summarized in Table 3. The experimental 
group contained 278 students, the control 
group, 269. In two of the comparisons sig- 
nificant chi-square values were found. 
These and the nonsignificant values show 
that the attrition rate was initially much 
greater for the control group than it was 
for the experimental group. Later it was 


greater for the experimental group. By the 
end of the second semester following the 
one in which the mathematics course wa 
taught, the cumulative attrition for the 
two groups was almost the same, about 
25%. It seems that the mathematics course 
was instrumental in delaying the semeste? 
in which a student might leave the Univer- 
sity but did not reduce the over-all tend- 
ency to drop out. 


Student and Faculty Opinion 


Students in the experimental group wet? 
asked their opinion of the mathematics 
course following the completion of it. Thei" 
faculty advisers asked them whether sue 
a course should be offered. Sixty-six P&% 
cent responded affirmatively. About tb? 
same percentage of the faculty advisers 1°” 
sponded in the same manner. In addition ® 
questionnaire administered by the instru?” 


N 


EVALUATION OF A MATHEMATICS COURSE 45 


tor at the end of the mathematics course 
showed that 51% of the experimental group 
thought that the course would be worth- 
while for all agriculture freshmen, 91% 
thought that it would be worthwhile for 
mes poorly prepared in mathematics, and 

i % thought that they were at least fairly 
Bu prepared in mathematies because of 
their work in the course. In the opinion of 
the majority of the students in the experi- 
mental group and their advisers, the math- 
ematics course was a success. 


Discusston AND CONCLUSIONS 


essi the foregoing results do not in- 
lie at the experimental course in ap- 
plied mathematics made a pronounced im- 
F on the subsequent academic success 
agriculture students. It is possible that a 
ums sd the influence of the experimental 
Eie HW i course was nullified by the 
and ni in which the courses in agriculture 
inm ono sciences were taught. Because 
So IL the students were known to be 
indes in mathematies, some of the in- 
small ts supplemented their courses with 
pane amounts of mathematics training 
ht qi to be necessary. No doubt this 
the et UIN during the evaluation of 
Bici hematics course. If so, both experi- 
this ins m control group students received 
"» Tuction. The mathematics course is 
iis + rud and attempts to evaluate 
types of lveness with respect to specific 
xd Students (e.g., overachievers and 
€rachievers) are being made. 


SUMMARY 


Ns Nec pes of an experimental 
ture fame mathematics for agricul- 
matical aus (except those whose mathe- 
iat ill is superior) was evaluated by 
&roup Fi an experimental and a control 
üvethows ag of subsequent grade-point 
mat cad: nal marks in courses involving 

; les, gains in knowledge of mathe- 


mati 
oriali and tendency to remain enrolled at 
Students In addition, the opinions of the 


e : a 
nts enrolled in the course and their 


TABLE 3 
COMPARISON OF EXPERIMENTAL AND CON- 
TROL GROUPS BY MEANS OF CHI-SQAURE 
ANALYSES IN TERMS OF TENDENCY 
TO DROPOUT 


Per- 
Semester Group oet lic d E 
outs | drop- | value 
outs 
Individual: 
During  se- | Exp. 7 2.5 
mester of | Con.| 13 4.8 | 2.1 
the course 
During first | Exp.| 14 5.0 
semester Con.| 26 9.7 | 4.3* 
following 
During sec- | Exp.| 50 | 18.0 
ond semes- | Con. | 34 | 12.6 | 3.0 
ter 'follow- 
ing 
Cumulative: 
End of first | Exp. | 21 7.6 
semester Con.| 39 | 14.5 | 3.9* 
after course 
End of second | Exp. | 71 | 25.5 
semester Con.| 73 | 27.1 | 0.2 
after course 


"Significant at the 5% level. 


faculty advisers concerning the course were 
obtained. 

The statistical evidence does not indicate 
that the experimental course influenced 
subsequent achievement or ultimate attri- 
tion rate to any important degree. How- 
ever the student and faculty opinions con- 
cerning the worth of the course were 


favorable. 


REFERENCES 


GetseLmMann, H. A. The effectiveness of a 
mathematics review course for freshmen 
in the College of Agriculture at Cornell 
University. Unpublished doctoral disser- 
tation, Cornell Univer., 1955. . 

Grtsermann, H. A. Mathematical deficiencies 
of college freshmen. Math. Teach., 1956, 
49, 22-29. 

Went, J. E., Newr, C. O., & AHMANN, J. 8. 
Statistical methods in educational and 
psychological research. New York: Ap- 
pleton-Century-Crofts, 1954. 


Received October 4, 1968. 


to show how counseling can be one of the most significant influences on the 
personal development of the growing individual— 


COUNSELING FOR PERSONAL ADJUSTMENT 
IN SCHOOLS AND COLLEGES 


Fred McKinney 


584 pages 1958 $6.00 
to help the teacher help the child— 


MENTAL HYGIENE IN ELEMENTARY EDUCATION 


Dorothy Rogers b 
497 pages 1957 $5.50 


to cover those recent advances which lead to a better understanding of child 
behavior and facilitate growth and learning— 


READINGS IN EDUCATIONAL PSYCHOLOGY 
edited by Jerome M. Seidman 


402 pages " 1955 Paper covers — $3.25 


HOUGHTON MIFFLIN COMPANY 


Boston 7 New York16 Atlanta 5 Geneva Dallas1 Palo Alto 
Massachusetts New York Georgia Illinois Texas California 


| 


x Fs 
Ploy, whose 


THE JOURNAL OF 


" EDUCATIONAL PSYCHOLOGY 


beo — ————— U 
V April, 1959 Number 2 


olume 50 


LEVEL OF ASPIRATION AND ACADEMIC SUCCESS! 


LEONARD WORELL 
State University of Iowa 


, Interest in the predictive value of “non- 
intellectual" faetors in academie achieve- 
ment has recently increased as a conse- 
quence, in part, of the limited predictive 
value of conventional academic aptitude 
measures, Only a small portion of the vari- 
ance in academic success can be attributed 
to the variance in measures of ability and 
e academic achievement. Unfortu- 
ately, the findings of available studies 
sealing with “nonintellectual” variables do 
S to contribute appreciably to ei- 
tion ee practical aim of improved predic- 
ie the theoretical purpose of increasing 

Senerality of behavior theories. 
aspice present study employs the level of 
fishies 10n paradigm in predicting academic 
E ae Most commonly, the level of 
inn ion refers to the subsequent level of 
antici mance which an individual states he 
lm paes achieving following perform- 
dena n a task. Major emphasis is on the 
ån — between previous performance 
Bi fone subsequent performance. It 
that io for some time, however, 
m x of aspiration is not a unitary 
Aspiration Che use of a number of levels of 
ewin is exemplified in the work of 
» Dembo, Festinger, and Sears 


Thi b 
a eran lesearch was supported in part by 
Search © -aid from the Social Science Re- 
the Tepes The author wishes to thank 
their coo: ers of the Reed College faculty for 
®pprecinth ration in this project. Particular 

ton is extended to Fred Courts, 
direction the questionnaire em- 


ed 
in the present study was developed. 


47 


(1944), where the levels a person hoped 
for, expected, and was minimally satisfied 
with in achieving were viewed as separately 
meaningful. The level of aspiration, there- 
fore, may be conceptualized more broadly 
and a variety of discrepancy scores may be 
obtained depending upon the nature of the 
questions which elicit aspiration state- 
ments. Not only may diserepancy scores be 
derived from expectancy estimates and pre- 
vious performance, but also from a com- 
parison of wished for performance and 
previous performance, wished for perform- 
ance and expected performance, and so on. 
In the present study, discrepancy scores 
are obtained by using various estimates of 
an individual's own performance rather 
than his actual performance. 

Discrepancy scores derived from esti- 
mates of performance may also be viewed 
as reflecting a dimension of reality-irreality. 
The relationship of level of aspiration to a 
reality-irreality dimension has appeared in 
the work of Lewin et al. (1944) and Rotter 
(1954). In this study, the reality-irreality 
dimension is defined as any discrepancy be- 
tween estimated performance or effort and 
some other aspiration estimate. The larger 
a discrepancy the more an individual may 
be regarded as unrealistic. 

The general hypothesis of the study is 
that the reality-irreality dimension is re- 
lated to academic success. It is assumed 
that persons with highly diserepant scores 
base their estimates of performance on un- 
realistic considerations of a wishful or 


48 LEONARD 
avoidant nature. Such individuals, when 
faced with academie performance situa- 
tions, are expected to invoke more unrealis- 
tie and avoidant responses than Ss with 
lower discrepancy scores. Thus, for exam- 
ple, of two persons with identical estimates 
of previous performance but divergent es- 
timates of subsequent performance, the one 
with the more discrepant score is expected 
to perform more poorly since achievement 
situations for him evoke more unrealistic 
behaviors. It should be noted that we are 
not concerned here with the factors leading 
to the development of unrealistic behavior, 
but only with the relationship of this be- 
havior to academic performance. 


EXPERIMENTAL HYPOTHESES 


Level of aspiration measures were con- 
structed on the basis of the following ques- 


tions, which were rated by all Ss on a decile 
scale: 


1. How hard do you work on your studies 
relative to other students? 


2. How do you think your average grades 
compare with those of your classmates? 

3. If you plan to return next year, how 
well do you expect to do in comparison with 
other members of your class? 

4. If you really tried to do well and worked 
near the limits of your capacity, how would 
your average grades compare with those of 
your classmates? 


5. How well would you like to do in order 


to be reasonably well satisfied according to 
your own standards? 


The following predictions refer to the 
combination of pairs of the preceding esti- 
mates. 

Academic adjustment is inversely related 
to the degree to which students’ estimates 
of: 


1. What their performance would be when 
working near the limits of capacity exceed 
their estimates of how hard they have worked 
in the past (Questions 4 minus 1). 

2. Their future performance are above 


WORELL 


their estimates of previous performance 
(Questions 3 minus 2). 

3. What their performance would be when 
working near the limits of capacity are above 
their estimates of previous performance 
(Questions 4 minus 2). 3 

4. What they would be reasonably satis 
fied with in their performance exceeds their 
estimates of previous performance (Ques- 
tions 5 minus 2). 


MzrTHOD 
Subjects 


The study was conducted in a small lib- 
eral arts college which places a strong em- 
phasis on scholastic achievement. The col- 
lege attempts, however, to encourage ? 
desire for learning rather than emphasize 
grade point achievement. Thus, grades 218 
assigned for administrative purposes but 
are not typically revealed to the student 
by the instructors. The student is given 
only an indication of his relative decile 
standing at the end of each year. Students 
in the freshmen class, in contrast to the 
other classes, had been given no indicatio? 
of their relative positions, since the study 
was conducted prior to the end of theif 
first year. 

The 421 students used in this study were 
drawn from the entire student body © 
about 550 for the academic year 1952-53- 
Subjects participated on a voluntary basis 
The ratio of males to females in both tbe 


sample and the total enrollment is approXi 
mately four to one. à 


Materials 


Two major types of data were obtained: 
level of aspiration data and criterion mes 
ures: the latter included (a) total colleg® 
grade average, computed by placing eac 
student into one of three categories, abov® 
C, C, and below C, for each year of his ate 
tendance, and then averaging across year? 
(b) grade decile rank during the year tb? 
the information was obtained ; (c) survive”, 
attainment or nonattainment of a degre” 


LEVEL OF ASPIRATION AND ACADEMIC SUCCESS 49 


TABLE 1 


RELATIONSHIP OF APTITUDE AND HIGH SCHOOL ACHIEVEMENT TO 
Decite RANK AND TOTAL GRADE AVERAGE 


Freshmen (N — 138) Sophomores (N — 97) 
CEEB | HSA DR TGA ACE HSA DR TGA 
-Aptitude 35* 34* 
p — Š 2 .30* — 
High School — -53* .44* = ui FE 
Achievement ` ! 
Juniors (V = 76) Seniors (N = 78) ` 
ACE HSA DR TGA ACE HSA DR TGA 
a 
Aptitude 
r = LB Ge | as — As" | 38 19 
High School 
S — 4 «94* | 2a — | .23** | .36* 
Achievement T7 3 
{Significant at .01 level. 
Significant at .05 level. L 
TABLE 2 


RzsurrsS AND DISCUSSION 


Efficiency of Traditional Measures 


ag studies (Henry, 1950) support 
achie ity of aptitude and high school 
Ec ime measures in predicting college 
snk SS. This study used the ACE with 
I m juniors, and seniors, and the 
the E freshmen. Table 1 shows that 
signif EB, but not the ACE, correlates 
tal Siren with grade decile rank and to- 
achieve e average. In addition, high school 
in pr ment is clearly superior to aptitude 
Cdiction and little (Table 2) is gained 


. YY combini . : 
fs id dae high school achievement and 


Aspirat;, 
Piration and Scholastic Success 


for a to an examination of the results 
Worthwhile enn’ Predictions, it may be 
S aspis e to determine the stability of the 
(Lewin en aon questions. Previous studies 
Shor ret al., 1944), employing relatively 
aspiratione. intervals, suggest that level of 
. Th I5 à reasonably consistent meas- 

© Stability of our aspiration meas- 


MULTIPLE CORRELATIONS OF APTITUDE AND 
Hicu SCHOOL ACHIEVEMENT TO DECILE 
Rank AND TOTAL GRADE AVERAGE 


Class DR TGA 
Freshmen 55 E 
Sophomores .94 43 
Juniors .35 .23 
Seniors a <2 -39 


ures over a four-year period on a group of 
40 is presented in Table 3. Considering the 
number of factors which may potentially 
contribute to changes in estimates, over à 
relatively long time span, the consistency 
of estimates is quite high. 

The following four predictions were ex- 
amined: Aeademie adjustment is inversely 
related to the degree to which students’ 
estimates of (a) what their performance 
would be when working near the limits of 
capacity exceed their estimates of how hard 
they have worked in the past; (b) their 
future performance are above their esti- 
mates of previous performance; (c) what 


50 LEONARD 


TABLE 3 


STABILITY OF ASPIRATION ESTIMATES 
Over Four-Year PERIOD 


Estimate g" 
1. Effort .54 
2. Previous Academic Performance .61 
3. Expected Performance Next Year -56 
4. Capacity Performance .52 
5. Reasonable Satisfaction Level 48 


* .39 required for significance at .01 level, 


their performance would be when working 
near the limits of capacity are above their 
estimates of previous performance; (d) 
what they would be reasonably satisfied 
with in their performance exceeds their 
estimates of previous performance. 

With the exception of the nonsignificant 
relationship to decile rank in the senior 
group in Prediction a, the results (Table 
4) show significant relationships between 
aspiration discrepancy scores and both dec- 
ile rank and total grade average in all other 


WORELL 


comparisons. We may conclude that the 
student who behaves unrealistically, or 
more specifically, the one who perceives his 
reasonable level of performance satisfac- 
tion as lying above previous achievement, 
has aspirations markedly beyond past pet- 
formance, estimates his potential capacity 
performance as lying far above the effort 
he expends, and believes that he can 
achieve far beyond what he already has by 
pressing himself to the limit of his ability, 
will tend to attain and continue to attam 
a lower scholastic standing. On the other 
hand, the student who holds moderate as- 
pirations, perceives his effort as being com- 
mensurate with his potential capacity Pet 
formance, does not see his performance 25 
markedly improving by making a “total 
push,” and whose standards of acceptable 
satisfaction are below his previous achieve 
ment, tends to obtain grade success. j 

Note may be taken of the generally de 
clining values of the correlations with grat e 
average from the sophomore to the senior 


TABLE 4 


INTERCORRELATIONS AMONG PREDICTOR 
PREDICTORS TO DECILE Ran 


VARIABLES AND THE RELATIONSHIP OF THE 
K AND TOTAL GRADE AvERAGE^ 


123| #| 54 é| D |171CA | 1]2|a]|4| s | 6 | pr | TG^ 
Freshmen (N —138) Juniors (V — 85) 
1392.60.28, .35|—.16|—.22**|—.26** 1 | 49] .es| .19| .17|—.24|—.25* |- 225, 
2 -73| .71|— .06|— .32|— .56**|.— .48**| 2 .09| .45| .19|—.21|— .28**|— 425, 
3 66| .01|— .40|— .57**|.—.63**| 3 64| .04|— 17 — .53**| — 577, 
4 -26|— .36|— .65**|.— .5g**| 4 —.13| .14|-.22* |—.52 
Sophomores (V = 99) Seniors (N = 75) 
1 | .40/ .69 .17 .14[—.22—.32*«—.27**| 1 | — | .ao| .19|-.3e| .oel-.o9 |-.2* 
2 69| .71| .38|—.14|— .52+*|— .45**| 2 SS | [ez 
A 65|—.17|— .17|— .60**— .70*^| 3 77| .04|— .28|— .48**— 357, 
1 — 15|- .24|— .57** — ‘sg+s| 4 14|- :34|— .38**|- -20 


? Variables include: 1. Estimated capacity performance minus estimated effort, 2, Estimated expected perfor 
ance minus estimated previous performance. 3. Estimated capacity performance minus estimated previous per! M 
ance. 4, Estimated reasonable satisfaction level minus estimated previous performance. 5. Aptitude. 6. High 5C 


achievement. 
* Significant at .05 level. 
** Significant at .01 level. 


LEVEL OF ASPIRATION AND ACADEMIC SUCCESS 51 


years. Tentatively, at least two factors may 
be involved in this progression: either those 
with highly diserepant scores have discon- 
tinued their education, thus making each 
Successive group more homogeneous, or 
some modification may occur in the ap- 
parent disabling aspects of high standards. 
An examination of the scores obtained by 
the four classes indicates that they do not 
differ with regard to means and variances 
of discrepancy scores. It would appear, 
then, that although highly discrepant 
Scores are associated with poor early at- 
tainment, some students are capable of 
learning to adjust to unrealistic standards 
and to function adequately. 

The high degree of relationship apparent 
among the predictors (Table 4) may be at- 
tributable to the common underlying di- 
pem of reality-irreality. On the other 
ee these relationships may also point to 
adi endeney of an individual to rate him- 

consistently in a given direction and 
e Involve response sets analagous to 
se proposed by Cronbach (1950). 


Multiple Correlations 


age 5 contains the multiple correla- 
Bd or ability and high school achieve- 
b dis! aspiration indices, and the combined 
Pes rig with both decile rank and total 
Mrd kr UM Di use of all six variables 
with jos the highest multiple correlations 
Sim) oth academic criteria. Although 
aix ies lower than those obtained with 
Sica the multiple Rs with the four 

er Bn „discrepancy measures exceed 
school a a with the ability and high 
ie Bons 1evement indices. The latter cor- 

inso et Strongly support the value of 

redicti atively nonintellectual indices in 

ng academic success. 


Survival 
Ina . 
Approaching the problem of mortality, 


ASpiration 3: 
ined SEMEN diserepancy scores were com- 
nto a single score (AS) for each 


student. This procedure seemed justified 
since the correlation of the four additively 
combined aspiration scores with total grade 
average yielded values of —.57 and —.54 
for the freshmen and sophomore classes re- 
spectively, both of which are significant be- 
yond the .01 level. Using the AS, an em- 
pirieal eutting point was determined for 
diseriminating between graduates and non- 
graduates. By adopting a cutting score of 
9 and below versus above 9, a chi square of 
16.14, significant beyond the .001 level, was 
obtained for the freshmen group. It is ap- 
parent from Table 6 that the AS is not 
equally discriminatory for both high and 
low scorers. Freshmen high scorers are dif- 
ferentiated into graduates and nongradu- 
ates with reasonable accuracy while low 
scorers are not. With the sophomore group, 
a chi square of 7.64 is obtained, significant 
at beyond the .01 level. In contrast to the 
freshmen, sophomore high scorers are not 
as clearly differentiated into graduates and 
nongraduates, while low scorers are more 
satisfactorily separated. Finally, Table 6 
indicates that the AS is generally less satis- 
factory for the juniors. The chi square is 
reduced to 6.08, significant at less than the 
02 level. Furthermore, in contrast to the 
freshmen and sophomore groups, high 
scorers are not satisfactorily separated into 


TABLE 5 
MULTIPLE CORRELATIONS OF ASPIRATION 
AND INTELLECTUAL MEASURES TO DECILE 
RANK AND TOTAL GRADE AVERAGE 


Fresh- Sopho- Juniors |Seniors 


men | mores 
Variables* - 

< < < 
A | oj A| o 
SEBEEBEEE 
5,6 .55].47].34,.43|.35,.23|.23/.39 
1,2,3,4 ^68|.69|.66].79].75|.63,.48|.39 
1, 2, 3, 4, 5, 6 |.75|.72|.74 .85|.70|.66/.50].52 


nior level do not include 


a The correlations at the sei D 
Xu can d Variables 1-6 see bot- 


Variable 2. For key to numbere 
tom of Table 4. 


52 


TABLE 6 
FREQUENCIES OF FRESHMEN, SOPHOMORE 
AND JUNIOR GRADUATES AND NONGRAD- 
UATES AND WITH HicH AND Low 


" AS SconEs 
AS Scores 
29 9 and < 
Freshmen 
Graduates 7 36 
Nongraduates 4l 32 
Sophomores 
Graduates 11 35 
Nongraduates 24 20 
Juniors 
Graduates 12 46 
Nongraduates 15 16 


graduates and nongraduates, whereas low 
scorers are differentiated more accurately. 


Aspiration, Grades, and Attrition 


It may reasonably be asked what rela- 
tionship grades have to both the AS and 
attrition. Table 7 shows that among below 
C students a significantly greater number 
drop out than remain to receive a degree 
(50 vs. 11), while the reverse is found 
among C and above C students (60 vs. 94). 
"Though grades are an obvious factor in 
mortality, these data indicate the presence 
of a large group with grades of C and above 
who leave for other reasons. 

Since the greatest attrition occurs during 
the freshmen and sophomore years (Re- 
search Report, 1953), these classes have 
been combined to increase our totals. The 
portion of Table 7 dealing with below C 
students indicates that, despite the rela- 

tively large number of high Scoring non- 
graduates, the AS fails to differentiate 
graduates from nongraduates at a signifi- 
cant level. On the other hand, students 


LEONARD WORELL 


with grades of C and above are successfully 
differentiated into graduates and nongrad- 
uates. The chi square of 14.35 is significant 


beyond the .001 level. Thus, although | 
grades are significantly related to attrition, 

the AS makes a preliminary contribution l 
toward discriminating graduates from non- 
graduates, in a group which is not expe- 
riencing scholastic deficiency. 

While the aspiration measures have been 
shown to be related to academic success, 
one factor which might enhance the ob- 
tained relationships should be considered. 
This involves the association between some 
of the discrepancy aspiration scores an 
previous performance. A number of studies 
have found that goal discrepancy scores are 
to some extent negatively related to the 
performances from which they are derived 
This stems primarily from two factors: (a) 
the greater opportunity for individuals low 
in performance to set higher goals, therebY 
obtaining larger discrepancy scores; 9! 
(b) forcing individuals at the extremes to 
set goals within a restricted range or in ® 
given direction. Thus, those with high per 
formances would be restricted in theit ber 
timates at the upper limit, while those with 

" 


TABLE 7 
FREQUENCIES OF GRADUATES AND 
NONGRADUATES AMONG STUDENTS 

DIFFERENTIATED BY GRADES 
AND AS Sconzs 


AS Scores 
. .. AS Beores ^. 


>10 10 and < 


Below C Students* 


Graduate 5 5 6 
Nongraduate 10 40 


C and Above C Students? 


Graduate 78 16 
Nongraduate 32 28 


? Below C Students X? = 1.93 (NS). 
b C and Above Students X? = 14.35 (001). 


LEVEL OF ASPIRATION AND ACADEMIC SUCCESS 


very low performances would have little 
choice but to set goals at or above pre- 
viously attained levels. These conditions 
leave open the possibility that the diserep- 
ancy scores predict well because of their 
dependence on previous performance. A 
similar argument has been advanced by 
Schultz and Ricciuti (1954). 

There are two aspects of this study, how- 
ever, which suggest that this interpretation 
of our findings is only partially and mini- 
mally correct. First, if including previous 
Performance in one of the criteria (total 
grade average) were influential, one would 
expect that the correlations between aspira- 
tion measures and total grade average 
Would inerease from the freshmen to the 
Senior classes, This is apparent since the 
more Advanced students would have more 
nus previous performance included in 
m ae ea This expectation is not born 
d y the results (Table 5), in that the 
ion correlations are the highest, fol- 
inne by the freshmen and juniors, with 

hu, eniors showing the lowest correlations. 
E: 5, the magnitudes of the relationships 

almost the reverse of expectation for 
sie DA classes. The second consider- 
"m 1s that the success of the aspiration 
Hs Peace with regard to attrition among 

"ind successful students (C and above 

Fita m f average grades) suggests that fac- 
in her than previous performance are 

18 assessed by these measures. 
Eus a number of studies (Gould & 
čuti ee Holt, 1946; Schultz & Ric- 
ating 1 4) have been unsuccessful in re- 
emic Biss of aspiration measures to aca- 
of the ee it is worthwhile to note some 
and sccm between the present study 
Thay be er ones. At least two differences 
type of on. The first of these is the 
in contrast o of aspiration task employed; 
5, the o the use of somewhat artificial 
that Wer Present study obtained estimates 
forman € highly related to the type of per- 
ce which we wished to predict. A 


53 


second possible factor is that contrary to 
customary procedure, we did not use the 
actual previous performance level of the 
student but rather the student’s estimates 
of his previous performance, in a setting 
where students maintain only a ae 
eral impression of their grade standing. 
This procedure may increase the probabil- 
ity of reality distortion. The further utility 
of this procedure will have to be evaluated 
by subsequent research, 


4 


SUMMARY AND CONCLUSIONS iF 


This study attempted to determine the 
theoretical and empirical utility of the level 
of aspiration method in predicting college 
grades and attrition. The general assump- 
tion was that discrepancy scores between 
various estimates related to academic per- 
formance reflected the reality-irreality level 
at which the individual operated. Individ- 
uals with large discrepancy scores were 
considered unrealistic and were expected 
to perform more poorly in academic situ- 
ations since they would be more likely to 
employ unrealistic and avoidant problem 
solution behaviors. 

Discrepancy measures related to the gen- 
eral hypothesis were obtained from the 
combination of estimates given to varying 
pairs of five level of aspiration questions. 
The college sample, consisting of almost 
the entire student body of a small liberal 
arts college, estimated their position on à 
decile scale according to how hard they 
worked, previous performance, expected 
future performance, performance when 
working near the limits of capacity, and 
level of reasonable satisfaction. 

The results suggest the following con- 
clusions: 

1. Predictions of academic performance 
using ability and high school achievement 


measures yield multiple correlations for the 


four groups ranging from 23 to 47. Almost 
btained with 


as high relationships were Oo 
high school achievement alone. 


54 LEONARD WORELL 


2. The correlations between the diserep- 
ancy aspiration scores and academic per- 
formance provided strong support for the 
specific hypotheses dealing with the reality- 
irreality dimension. The student who held 
aspirations close to previous performance, 
who perceived his level of reasonable per- 
formance satisfaction as lying below pre- 
vious performance, who believed that he 
would not achieve considerably beyond 
what he' already has or his previous effort 
by exerting himself to the limits of his ca- 
pacity, tended to be successful in grade 
achievement. 

3. The magnitudes of the intercorrela- 
tions among the four predictors suggested 
that they were measuring a common factor 
to a high degree. 

4. The combination of the four predic- 
tors in a multiple correlation indicated that 
higher predictability is obtained with these 
relatively “nonintellectual” measures than 
with the ability and high school achieve- 
ment variables, The highest multiple cor- 
relations, however, were obtained through 
the combined use of both “intellectual” 
and “nonintellectual” measures. 

5. An attempt was made to differentiate 
graduates from nongraduates through the 
summation of the four predictors into a 
single index. Successful discriminations 
were made for the freshmen and sopho- 
more groups, but the discriminative power 
of the index was considerably reduced for 
the juniors. 

6. A further attempt was made to de- 
termine whether the diseriminatgry power 
of the aspiration index was related to the 
academic standing of the student. This 
analysis revealed that the index was un- 
successful in differentiating graduates from 
nongraduates among below C students but 
was effective among the C and above C 
students. 


In general, the results provide support 
for the fruitfulness of the level of aspira- 
tion method in predieting academic per- 
formance and attrition among a superior 
group of students. It should be emphasized 
that the success of the aspiration measures 
provides empirical confirmation of the util- 
ity of the level of aspiration approach 
rather than offering a practicable instru- 
ment for selection. Since the nature and 
setting of the particular sample employed 
was unique, one might not expect that such 
nonintellectual variables would play as 
meaningful a role among less homogeneous 
and less academically oriented students. 


REFERENCES 

Cronsacu, L. J. Further evidence on sapor 
sets and test design. Educ. Psycho! 
Measmt., 1950, 10, 3-31. adf 

Gour», R., & Kaptan, N. The relationship © 
"level of aspiration" to academic HS 
personality factors. J. soc. Psychol. 1940, 
11, 31-40. 3 

Hesry, E. R. Predicting success in collet 
and university. In D. H. Fryer & E. ei 
Henry (Eds.), Handbook of applied p 
chology. Vol. 2. New York: Rinehart, 
1950. - TS. 

Horr, R. R. Level of aspiration: Ambitio: 
or defense? J. exp. Psychol. 1946, 3 
398-416. 

Lewn, K., Depo, T., FEsrINaEn, L., & SEARS: 
Paure S. Level of aspiration. In he 
MeV. Hunt (Ed.), Personality and e 
behavior disorders. Vol. I. New Yor** 
Ronald Press, 1944. am 

Research report to the self-study committe’ 
Unpublished manuscript. Reed Colle£^ 
1953. vie 

Rorter, J. B. Social learning and clint 
psychology. New York: Prentice-H® : 
Inc., 1954. of 

Scnvurz, D. G., & Riccurrrr, H, N. Level ne 
aspiration measures and college achierc 
ment. J. gen. Psychol., 1954, 51, 207-21? 

Sears, P. S. Levels of aspiration in acadey il- 
cally successful and unsuccessful O75 


dren. J. abnorm. soc. Psychol., 1940: 
498-536. 


Received March 22, 1968. 


JounNAL or EpvcarioNAL PsycuoLoov 
Vol. 50, No. 2, 1959 


SOME CHARACTERISTICS OF HIGH SCHOOL PUP 
FROM THREE INCOME GROUPS 53 


JOHN K. COSTER 
Department oj Education, Purdue University 


In a recent study, a sample of S78 high 
School pupils was divided into three income 
Broups, and responses to 27 items reflecting 
attitudes toward high school were ana- 
lyzed to ascertain variations among income 
seta attitudinal items pertained to 
fom objects as school, teachers, school 
one other pupils, and the value of 
a a on, The groups varied significantly 
lsfion Y eight of the 27 items. Greatest var- 
the ae were noted on items deal- 

ie 1 interpersonal relationships (e.g, 
Pod] ee that the other students like 
respon: lere was little or no variation in 
Pus e to items which suggested an ap- 
Such as > the school or school program, 
activiti he number of school subjects and 
System es tripe in the school, the grading 
the oe by he administration of the school, 
ment a "s of teaching, and school equip- 

Fei nd facilities, 

Miser the forementioned study was com- 
availaby an analysis was made of other 
compare data from the same sample to 
eral item, the three income groups on sev- 
sex, M eei personal information—such as 
in se € Ing of parents, and participation 
poses and community activities. The 
entity Aa the second analysis were to 

E As characteristics on which pupils 
and to m groups differed significantly, 
hoteq * Mpare affective behavior (de- 
with foci dane to attitudinal items) 

evera] al descriptions of the Ss. 
Pupils of pee. of the characteristics of 
ave bee lverse socioeconomic groups 
lingshead Teported in the literature. Hol- 
a lanea $n? for example, found that 

i sed. ii middle and lower-upper 

ere more likely to attend 


e A 
vents, school dances, evening 


pu 
idi 
in 


plays and parties, and participate in ex- 
tracurricular activities than adolescents 
in lower classes. Smith (1945) reported 
a high degree of relationship between 
participation in extracurricular åctivities 
and Sims Socio-Economic scale scores. In 
a study of junior high school pupils, 
Abrahamson (1952) found that middle 
class pupils participated in more extra- 
curricular activities, received more schol- 
arship awards, scored higher on social 
acceptance scales, held more student 
offices, and participated more frequently 
in student government, proportionately, 
than pupils in lower social classes. 

An investigation of factors related to 
social acceptance of pupils living in a small 
community, the population of which was 
expanded and changed radically when a 
war industry was located in the area, was 
conducted by Morgan (1946). His findings 
disclosed that the social position of pupils 
was stabilized rapidly, and that father’s in- 
come and parents’ education were the two 
most pertinent factors bearing on social 
acceptance and reputation. Keisler (1954) 
examined the relationship between pa- 
rental occupation and membership in 
YMCA and YWCA sponsored clubs, and 
reported that occupation of parents, grade 
point average, and scholastic aptitude were 
related significantly to membership in 
these clubs. 


PROCEDURE 


A questionnaire, containing 27 attitudi- 
nal items and a number of personal infor- 
mation items, was administered to approxi- 
mately 3000 pupils in nine Indiana high 
schools (Coster, 1955). A sample of 878 
cases was selected from the returns. The 


56 JOHN K. COSTER 


TABLE 1 


NUMBER AND PERCENTAGE OF PUPILS IN 
Eacu Income Group 


Number and percentage 


. Number of of pupils I 
EI group 
Home scale N % 
0-2 219 24.9 Low 
3-4 558 63.6 Middle 
5-7 101 11.9 High 
Totals....| 878 100.0 


sample included 100 questionnaires, se- 
lected randomly, from each of six larger 
schools, and all usable questionnaires from 
three smaller schools. 

The personal information items included 
a “House and Home” scale, designed to 
yield an indication of income level. This 
scale has been used extensively by Rem- 
mers and others in the Purdue Opinion 
Panel studies to divide pupils into income 
groups.’ Remmers and Kirk (1953) and 
Elias (1944) have reported on the validity 
of the scale. The scale consisted of seven 
items found in the home or provided for 
the pupil? Based on the number of items 
checked by each respondent, Ss were di- 
vided into high (five to seven items 
checked), middle (three or four items), 
and low (zero to two items) income 


"The Purdue Opinion Panel is published 
periodically by the Division of Educational 
Reference, Purdue University. Permission 
to use the "House and Home" scale was 
granted kindly by the publishers of the 
Purdue Opinion Panel. Special acknowledge- 
ment is due R. L. Horton, who suggested the 
items used in this scale. 

*The items were a vacuum cleaner; an 
electric or gas refrigerator; a bath tub or 
shower with running water; two automobiles 
(excluding trucks); lessons in drama, art, 
expression, dancing, or music provided out- 
side the school; an automatic dishwasher; 
and a cabin or cottage for vacations. 


groups. The number and percentage of 
pupils in each income group are shown in 
Table 1. 

The personal information items included 
such categories as sex, size of family, 
schooling of parents, hours spent working 
or studying at home, grade received in 
school, participation in high school and 
out-of-school activities, and offices held in 
organizations. Each item in the question- 
naire was followed with a list of appro- 
priate responses, one of which was to be 
checked. 

For each item a null hypothesis was 
postulated: There is no difference in the 
proportion of pupils checking each re- 
Sponse category when they are divided into 
three income levels. Hypotheses were 
tested by the chi-square technique with 
tests based on a series of 3 x n contin- 
gency tables. Adjacent response categories 
were combined, whenever necessary, tO 
provide a minimum expected frequency 
of five in each cell. Additional combina- 
tions were made to conserve space in the 
tables, where reductions in the number © 


cells did not alter the character of the re- 
sults, 


Resvtts 


Analyses were made of 23 personal 
information items. In marked contrast tO 
the relative homogeneity of responses to 
the items in the attitudinal study (Coste! 
1958), the pupils in the three groups varie 
sharply on two thirds of the personal 19° 
formation items. A total of 16 hypotheses 
were rejected, 13 at the 001 level of c02” 
fidence, and one each at the 01, .02 an 
.05 levels. 

For the report of the results, the item? 
are grouped into four divisions: Perso?! 
Information, Success in School, Part! 4 
pation in School and Community Acti" 
ties, and Work or Study. 


PUPIL CHARACTERISTICS AND INCOME LEVEL 57 


Personal. Information 


The personal information items include 
sex, broken homes, size of family, schooling 
of father and mother, and employment of 
mother outside the home. Most of the 
items in this division are associated with 
Income level or socioeconomic status. Re- 
Sults of tests of significance are shown in 
Table 2. Three of the six hypotheses were 
Tejected at the .001 level of confidence, one 
Was rejected at the .02 level, and two hy- 
Dotheses failed to be rejected. 

» Size of family and schooling of parents 
aried Significantly with income level, with 
< 001. Computed chi-square values 
E exceeded the value required for sig- 
Erf at this level. With regard to 
ze on high income families were 
thr than twice as likely to have one to 
ee children as low income families, and 


low income families were more than twice 
as likely to have four or more children as 
high income families. Concerning schooling 
of parents, variations were noted at each 
level of education, with greatest variations 
observed in the incidence of college attend- 
ance. Approximately one of four high in- 
come fathers attended college, as com- 
pared with one middle income father in 
eight, and one low income father in 30. 
Mothers varied similarly. One high income 
mother in four attended college, as com- 
pared with one middle income mother in 
seven, and one low income mother in 30. 

There were variations in the percentage 
of boys and girls in each income group, 
and the attending hypothesis was rejected 
at the 2% level. The sample included 450 
boys and 428 girls, but the difference in 
proportion of boys and girls was not sig- 
nificant. The percentage of boys exceeded 


R TABLE 2 
STR Or TESTS or SIGNIFICANCE OF PERSONAL Inrormation Items, SHOWING 
ERCENTAGE or PUPILS CHECKING RESPONSE CATEGORIES By INcowE GROUPS 


Income group 


Item 
High | Middle] Low | Total | P 
l. Sex, 
Fi ios contusa ab nan AE RR 48.5 | 54.8 | 43.4 | 51.3 
EE E ETE 51.5 | 45.2 | 56.6 | 48.7 | .02 
ne wishom do pupils live? 
ith father and mother... 85.0 | 84.2 | 77.6 | 82.7 
crore" 15.0 | 15.8 | 22.4 | 17.3 | .10 
x abar of children in family. 
b; Vne to three children 68.3 | 52.7 | 29.6 | 49.5 
4. Sa OY or more children. ....--.— 31.7 | 46.2 | 70.3 | 50.5 | .00 
S pine 56.0 | 39.0 
18.8 | 36.2 7 
52.5 | 50.8 | 40.8 | 48.5 
5. 28.7 | 12.9 | 3.2 | 12.4 | .001 
izh 9.9 | 21.8 | 44.4 25.0 
Sch i 4.5 | 52.2 à 
$. College.. dtr E a i 257 3 | 33 | 12.5 | o 
a. Tes. employed outside of home. sie | coe: | eal ate 
PSAE Dot iar i 68.3 | 67.6 | 76.1 | 69.8 | .10 


58 


the percentage of girls in the middle in- 
come group, whereas the percentage of 
girls exceeded the percentage of boys in 
the high and low groups. Additional chi- 
square tests were made, comparing the 
high group with the others, the high and 
middle groups, the low group with the 
other two groups, and the low and middle 
groups. A significant chi-square value was 
obtained when the low group was com- 
pared with either the middle group, or 
with the other two groups. These data 
probably suggest a higher dropout rate 
among low income boys than low income 
girls. 

A higher percentage of pupils from the 
low income group than from the high and 


JOHN K. COSTER 


middle groups lived with only one parent, 
a step-parent, or other relatives or persons. 
The differences in proportions, however, 
were not sufficiently large to be statisti- 
cally significant. And the data in Table 2 
indicate that a higher percentage of high 
and middle income mothers are employed 
outside the home, but, again, the differ- 
ences were not significant. 


Success in School 


The “Success in School” items, listed in 
Table 3, included queries on plans to con- 
tinue education, failure in school, and 
average marks. Six hypotheses were tested, 
and five were rejected at the .001 level- 
The proportion of pupils who indicated 


TABLE 3 


RzsurcTS or Tests or SIGNIFICANCE 


or Success IN ScHooL Items, SHOWING 


PERCENTAGE or PUPILS CHECKING RESPONSE 
CATEGORIES, BY INCOME Group 


Item 


Income Group 


7. Plan to graduate from high school? 
Cho ot, te wiscasoavincn nis « 


b. Undecided or no 


B. Y6B.c acis encre 
b. Undecided or no................- 


9. Years failed in grade school? 
a. None 


a. None 


a. Yes 


13. Field of specialization. 
a. College preparatory or academic 
b. General. . 
c. Vocationa 


a. Yes 


High | Middle] Low | Total P 
99.0 | 96.4 | 89.0 | 94.9 

L0| 3.6 | 11.0 | 5.1 | .00 
49.5 | 32.1 | 15.5 | 30.0 

50.5 | 67.9 | 84.5 | 70.0 | .001 
94.1 | 86.1 | 77.6 | 84.9 

5.9 | 13.9 | 22.4 | 15.1 | .00l 
93.1 | 8.9 | 82.8 | 97.3 

6.9 | 12.1 | 17.2 | 12,7 | .05 
71.3 | 49.4 | 31.2 | 47.4 

28.7 | 50.5 | 68.8 | 52.6 | .00l 
42.4 | 35.3 | 20.3 | 393 i 
57.6 | 64.7 | 79.7 | 67.3 | .00 
34.0 | 19.1 | 9.0 | 15,5 

13.4 | 17.7 | 13.2 | 162 1 
52.6 | 63.2 | 775.8 | 65.3 | .00 
42.6 | 34.9 | 325 | 352 

57.4 | 65.1 | 67.3 | 64.8 | -30 


PUPIL CHARACTERISTICS AND INCOME LEVEL 59 


E they plan to graduate from high 
hi ool and go to college was significantly 
a d for the high group than for the 
arde dh i ere were more failures in 
ilis low ies and high school subjects for 
groups Dite. group than the other two 
v a ae in grade school failures 
Tires gnificant at the .001 level. The 
Paige =, however, were more homo- 
alia ie with regard to failure in high 
d o] subjects, with P < .05. 
"- icles Success, as measured by aver- 
among TE in school, varied markedly 
Pupils i three groups. High income 
report A Te more than twice as likely to 
income or B grade averages than low 
noted wee Similar variations were 
eir ors l pupils were asked whether 
honor rolls, have appeared on their school 
teens Bd related to the success in school 
tialization the questions on field of spe- 
Sults of te and selection of life’s work. Re- 
items ate sts of significance for these two 
Specialite aed in Table 3. Field of 
me e. varied significantly with in- 
centage a with P< 001. A larger per- 
low ine igh Income pupils than middle 
College fees pupils were enrolled in the 
ums, ihe atory or academic curricu- 
Tacteq in eas low income pupils were at- 
; ` BTreater proportions to one of the 
programs. Among the three 
the Dereenta, was very little difference in 
eneral pro ges of pupils who elected a 
“tage of Stam. The relatively high per- 
Vocationa Pupils who were enrolled in a 
arts in af oe (including industrial 
home ‘ition to vocational agriculture, 


Broups, the: 


Droh n ing, and industrial education) 
Tieu]a ded 3n indication of limited cur- 
cally erings in small schools. Practi- 
pol es im the smaller schools were 
arge Vocational agriculture, and a 


Pro r A 
Portion of girls in the smaller 


two S Studi ; 
VO or cl homemaking for at least 
years, 


Approximately one third of the Ss indi- 
cated that they definitely had selected 
their life’s work. A higher percentage of 
the “Yes” responses was from the high 
income group than from the other two 
groups, but the differences were not sig- 
nificant. 


Participation in School and Community 
Activities 

All five hypotheses pertaining to school 
and community activities were rejected, 
four at the .001 level and one at the .01 
level, according to data in Table 4. One 
indication of participation in school ac- 
tivities—and social acceptance—is the ex- 
tent to which youths associate with fellow 
pupils. A significantly higher proportion 
of low income pupils, than high and middle 
income pupils, reported that they had 
most of their social activities outside of 
school with youths who go to other schools 
or who are not in school. 

Extremely wide variations occurred in 
the number of high school and out-of- 
school activities in which pupils of the 
three groups participated. Nearly ten 
times as many low income as high income 
pupils did not participate. Whereas nearly 
three fourths of the high income pupils 
reported that they participated in three 
or more high school activities, only one 
fourth of the low income group reported 
such extensive participation. And whereas 
nearly two thirds of the high income pu- 
pils indicated that they had participated 
in three or more out-of-school activities, 
less than one fifth of the low income group 
indicated similar participation. A diver- 
gence was also noted in the percentages of 
pupils who have been elected to an office. 
More than three fourths of the high in- 
come pupils and more than two thirds of 
the middle income pupils have been elected 
to an office in a high school or out-of- 
school activity, as compared with only 
40% of the low income group. 


Differences in the behavior of youths 


60 


JOHN K. COSTER 


TABLE 4 


RESULTS or TESTS or SIGNIFICANCE 


OF PARTICIPATION IN SCHOOL AND 


Community Activities Irems, SHOWING PERCENTAGE OF Purirs 
CHECKING RESPONSE CATEGORIES, BY INCOME GROUPS 


Income Group 


Item 
High | Middle! Low | Total P 
. With whom do pupils associate? 
2 x With pupils is the same high school...| 82.2 84.1 74.0 81.4 
b. With youth in other high schools or not ai 
TELL LCT) ER venil e RUE RR 17.8 15.9 26.0 18.6 : 
. Participation in high school activities. 
3 Fe de : 2.0 9.5 19.2 11.1 | 
25.8 44.2 55.7 45.0 
35.6 29.5 21.0 28.1 1 
36.6 | 16.7 4.1 | 15.8 | .00 
3.0 11.8 27.9 14.5 
32.0 49.1 52.5 48.1 
39.6 30.9 17.8 28.5 001 
24.7 8.6 2.3 8.9 . | 
78.0 69.7 40.2 63.6 1 
RI ue edm m 22.0 | 30.3 | 59.8 | 36.4 | .00 
. Frequ 
attendance. 
a. Three or more times per month...... 78.0 | 66.8 52.1 64.5 
b. Less than three times a month.... 21.0 28.7 32.8 28.8 1 
ONE DEN n NM EL em LO | 45 | 15.0 | 6.7 | .00 


were noted in extent of Sunday School and 
Church attendance, Slightly more than 
three fourths of the high income pupils 
and over two thirds of the middle income 
pupils reported Tegular Sunday School 
and Church attendance, as contrasted with 
slightly more than one half of the low 
group. Fifteen per cent of the low income 
pupils indicated that they never attended 
Sunday School or Church, as compared 
with less than five per cent of the middle 
group and only one per cent of the high 
group. 
Work and Study 

The fourth division pertains to an anal- 
ysis of income group variation with re- 


gard to the number of hours spent work- 
ing or studying outside of school. This 


" imited 
aspect of the study has received WS 
attention in other investigations of sm 
level and social status. To get the nec 
Sary information, pupils in the samP 
were asked these questions: udy 

1. About how many hours do you st 
at home each week? eck 
2. About how many hours each d 
(during the school year) do you wor 
home? for 
3. Do you receive pay or allowance 
the work you do at home? y do 
4. About how many hours each wee "t 
you work at a job away from home dur”? 


the school year from which you earn 50 
money? 


e 
The results of tests of significance E | 
Shown in Table 5. High income pups 
tended to spend more time studying jli 
home than low and middle income pup ely 
and middle income pupils were more lik 


f 


PUPIL CHARACTERISTICS AND INCOME LEVEL 


61 


TABLE 5 


Rzs 
mE TESTS OF SIGNIFICANCE OF WORK or STUDY ITEMS, SHOWING 
TAGE OF Pupits CHECKING EacH RESPONSE, BY INCOME Group 


Income Group 


Item 
High | Middle | Low Total P 
20. ; 
Em pupils study at home each week. 
p Ded than five hours per week......... 62.4 73.4 70.4 71.5 
91. Hou € hours or more per week.......... 37.6 26.6 29.6 28.5 .10 
5 n Pupils work at home each week. 
k oe ten hours per week........| 61.4 | 61.6 | 59.8 | 61.1 
22. men hours or more per week. ........ 38.6 38.3 40.3 38.9 .90 
h Ive pay or allowance for work at 
ome 
a. Y A 
, b. Ne 42.6 50.9 42.7 47.9 
3. ee uis vt V3 [sp eC TINI rr ss] Dit 49.1 57.3 52.1 .10 
Sous Pupils work outside of home eac 
a oe 
j ed fion ten hours per week........| 73.3 | 79.0 | 75.9 | 77.6 
ours or more per week. ........| 26.7 20.9 24.1 22.4 .50 


to rece} 
they s d E and allowance for the work 
Wo kar home than youths in the other 
Significant: rs the differences were not 
erence ES here was practically no dif- 
Lumber " p the three groups in the 
Worked out hours per week that pupils 
psi Side of school, either at home 
aY from home, 


Discussion 


So; 
thesis” - eneral comments based on a syn- 
Presented fe findings of the two studies 
article (Co this paper and in the previous 
Driate, ~oster, 1958) are here appro- 
favorably t Income pupils responded more 
Dils yy; hs han middle or low income pu- 
h stu Pers to the majority of items in 
tiations + owever, there were marked 
phic null * the number of items for 
he Present Ypotheses were rejected. In 
Were o tai Study, significant differences 
i Eei for the majority of the 
is = to attitudinal items which 
hee Draisal of the school and 
generally did not manifest 


these marked differences in school and 
community experiences. 

There are indications that the low in- 
come pupil has tended to accept his lot in 
school in a state of resignation. Perhaps 
he is conscious of the efforts of his school 
to provide an appropriate educational 
program for him. The findings show, nev- 
ertheless, that he participates in few school 
activities, yet complained no more about 
the number of activities than high or 
middle income pupils. He tended to re- 
ceive low grades in school, yet he was as 
satisfied with the grading system as high 
and middle income pupils whose average 
grades were much higher. And his opinions 
of the subjects offered in the school did 
not differ from the opinions of pupils from 
higher income homes, yet the low income 
pupil is more likely to deprecate his 
chances of getting the kind of job he wants 
after high school even though there is a 
greater possibility that he is enrolled in à 
vocational program designed to fit him 


for useful employment. p 
Responses to attitudinal items dealing 


62 


with social acceptance, in contrast, varied 
similarly with the school and community 
experience items reported in this study. 
It would seem, perhaps, that low income 
pupils recognize that there are activities 
in which they may participate, but that 
either they do not wish to participate, 
possibly because of variations in social 
values attached to these activities, or they 
feel that they are not welcome to join. 
The findings of these two studies tend 
to suggest that although schools may sub- 
scribe to a position of equality of educa- 
tional opportunity, equality of educational 
experiences falls somewhat short of the 
ideal. Even though a predominant theme 
of the contemporary philosophy of Ameri- 
can education is to provide appropriate 
educational opportunities for all Ameri- 
can youth, sociologieal and psychological 
factors tend to operate against the attain- 
ment of this goal. Then, too, criticisms to 
the effect that American education has 
been more concerned with the education 
of the "masses" at the expense of the 
"elite" are more apropos when leveled at 
theory than at practice. The findings from 
these and other studies of social status 
and income level suggest that American 
schools, consciously or unconsciously, have 
been prone to favor pupils from middle 
and upper socioeconomic strata which, 
generally, include a high proportion of 
academically competent youth. 


Summary 


In this study, 878 pupils were divided 
into three income groups, and the pupils 
in the groups were compared on a number 
of items of personal information—such as 
sex, schooling of parents, and participa- 
tion in school and community activities, 
High income pupils were more likely than 
middle and low income pupils to partici- 
pate in high school and out of school ac- 
tivities, hold an office in an organization, 
get high marks in school, be named to the 


JOHN K. COSTER 


school honor roll, attend Sunday School 
and Church regularly, successfully com- 
plete courses in school, and continue edu- 
cation. The number of hours spent study- 
ing or working outside of school, during 
the school year, however, did not differ 
among the groups. The number of items 
related to school and community experi 
ence which varied significantly with in- 
come level contrasted markedly with the 
relative homogeneity of responses of the 
three groups to attitudinal items pertam- 
ing to the school and school program, 2 
reported in a previous study of the same 
Ss (Coster, 1958). But there seemed to be 
an essential agreement between differences 
in responses to attitudinal items dealing 
with social acceptance and the extent O 


participation in school and community 207 
tivities. 


REFERENCES 


ABRAHANSON, A. Our status system and sche 
lastic rewards. J. educ. Sociol, 1952, 2% 
441-450. in 

Coster, J. K. Factors related to morale x 
secondary schools. Unpublished docto” 
dissertation, Yale Univer., 1955. ol 

Coster, J. K. Attitudes toward school E 
high school pupils from three ond 
levels. J. educ. Psychol, 1958, 49, 61-0" 

Ents, G. A study of certain methods of as 
titude measurement and related es z 
bles. Unpublished master's thesis, P" 
due Univer., 1944. w 

HoruxcsHean, A. B. Elmtown's youth. Ne 
York: John Wiley, 1949. nt 

Kerster, E. R. Differences among adolesce sA 
social clubs in terms of members’ chars, 
teristics. J. educ. Res., 1954, 48, 297-3 it- 

Morsan, H. G. Social relationships of b 
dren in a war-boom community. J. 6%" 
Res., 1946, 40, 271-286. vty 

«Remmers, H. H., & Kirg, R. B. Scalabilit? 
and validity of the socio-economic ui 
tus items of the Purdue Opinion Pa” 
J. appl. Psychol., 1953, 37, 384-386. a 

Surrg, P. A study of the selective charac 1 
of American education: Participation 
school activities as conditioned by 50^". 
economic status and other factors 
educ. Psychol., 1945, 36, 229-246. 


r 


JoUnNAL or Ep! 
'UCATIONAL P& 
Vol. 50, No. 2, 1959 'YCHOLOGY 


MEASUREMENT CHA 
RACTERISTICS OF RECALL IN RELA’ 
THE PRESENTATION OF INCREASINGLY EGE e 
AMOUNTS OF MATERIAL! 
WILSE B. WEBB? AND MARVIN SCHWARTZ 
U. S. Naval School of Aviation Medicine 


edis for differences in the amount 
given doce individuals possess about a 
bring ^ Y of information a well-known 
2 fo exists: the greater the number 
ess p information tested for, the 
ERE ar e consistency in the measure- 
neatly ene differences. This is 
Ba marized in the Spearman- 
Borg P^ This is not the point of 
35 38 oen : 4 paper. Our problem, so far 
iris iet ell, has not received attention 
aye lnc of individual differences. We 
ween th Tned with the relationship be- 
RASNO consistency of measurement of 
^" Mih obtained by individuals and 
ented thet of items of information pre- 
M TES doni at one presentation. 
Broater ee terms, would there be 
dividuals sistencies of differences between 
ample de Ai were presented, for ex- 

a E m of information and tested 
Present iyd items or between individ- 
Sted on f ed 50 items of information and 

ve of these items? 


S 


uals 


PROCEDURE 

aiig Presented 

iesto 

from aie averaging 741 words, edited 
Used for ae Mythology (1931), were 
Aese tori Information presented the Ss. 
No only oe Possessed the advantages of 
Ss, but i emg generally unfamiliar to the 
' M addition; each contained con- 


1o. 
thig PIhiong 
8 r and conclusions contained in 


ar ort 

fe 5 Re those of the authors. They 

Ne ting thesi, seonairued as necessarily re- 
Y Department. the endorsement of the 

a BE UB 
ow at Gtiversity of Florida. 


tate University of Iowa. 


* tion obtained imme 


siderable detail which permitted the rela- 
tively easy development of comprehension 
tests (Webb & Wallon, 1956). 


Tests Used 


Forty-five true-false test items were em- 
ployed. These items were chosen by item 
analyzing a considerably larger pool of 
items, eliminating nondiscriminative items 
(Webb & Wallon, 1956). 


Subjects 

The Ss in this experiment were 345 
naval aviation cadets. In general they pos- 
sessed a minimum of two years of college 
and were further screened on an intelli- 
gence type test prior to entering naval 
aviation. 


Method of Presentation 

The Ss (ranging in N from 31-69) in 
each group were distributed a set of stories 
and questions face down on their desks. 
They were told to read the stories and 
answer the questions as if they were listen- 
ing to the story. They were told to read 
steadily forward at a comfortable rate, not 
going back or pausing on either the stories 
or the questions. 


EXPERIMENTAL GROUPS 


Group I 
The questions relevant to each para- 
graph of three stories were inserted im- 


mediately behind each paragraph. The 


task of the S was merely to read a single 


paragraph and to reproduce the informa- 
diately afterwards. 


64 WILSE B. WEBB AND 


Group II 


For this group the questions were be- 
hind each of the three stories. These Ss 
read approximately 700 words before be- 
ing questioned on the information con- 
tained therein. 


Group III 


This group read all three stories with 
the questions appearing at the end of the 
group of three stories. Here the Ss read 
approximately 2200 words before answer- 
ing the questions. 


Group IV 


Six stories were presented this group 
before the questions about the stories were 
presented. 

In the first three groups the same three 
stories and 45 test items were involved. In 
the fourth group the addition of three dif- 
ferent stories and sets of questions was re- 
quired. The stories common to the previ- 
ous three groups appeared first as did the 


common test items when the questions 
were asked. 


Resuurs 


The basic data of this experiment, given 
in Table 1, include the means, Ss variance, 
error variance, and the resultant Hoyt 
homogeneity estimate of reliability for the 


45 items in common between the four 
groups. 


TABLE 1 
RrsuLTS or Hoyr ANALYSIS OF VARIANCE 
ESTIMATE oF RELIABILITY 


a 
a S i- 
E N Mean Variance vd. abiliy 
I| 69 | 40.14] .135 -084 -378 
II} 127 | 38.20 | .343 -102 -703 
III| 90 |36.65| .454 -119 .740 
IV| 59 | 34.07 | .458 150 .672 


MARVIN SCHWARTZ 


Discussion 


Our data indicate that an increase in 
the information presented resulted in an 
inerease in both the between Ss variance 
on test materials and an increase in the 
error variance. In our experimental situa- 
tion the error variance tends to increas? 
in positively accelerated fashion whereas 
the Ss variance appears to increase in $ 
negatively accelerated fashion. Because the 
Ss variance is considerably greater than 
the error variance, this results in an in* 
creasing reliability estimate through the 
first three conditions but a decreased esti- 
mate in the fourth condition. 

It is likely that the critical factor in the 
increasing Ss variance is the increase m 
item difficulty and hence the discrimina- 
tion between Ss. That there was such 2? 
increase in item difficulty can be seen from 


I! 
the means given in Table 1. Whether ?? 
addition the increase in difficulty resulte 
in a change in the factor structure and s 
resultant increase in the individual differ 
ences cannot be tested in this design bU 
Tepresents a real possibility. bly 
The increased error variance is probea 
attributable to a number of factors. J n 
extending the amount of informatio” 
which must be learned, there is an a 
crease in the probability of fluctuations ee 
attention, variations in amounts of pre 
vious information, methods of learnine 
proactive and retroactive inhibition effec g 
and failures to recall leading to guest 
behavior. All of these factors could re a 
in errors in response unrelated to the p^? 
S variance and should increase as b 
amount of information which an S 2” 
learn and recall increases. re 
Our results imply that if Ss are P d 


sented a limited amount of material K 
immediately tested for comprehension, 
though the error variance will be 10W? 
too will be the S variance and hence 
measure will have a limited reliability- 


pe 


RECALL WITH VARYING AMOUNTS OF MATERIAL 65 


the amount of information to be learned 
Is Increased, the between Ss variance will 
Increase only up to a point but the error 
variance will continue to increase. Be- 
cause of this, it follows that beyond a given 
amount of information, the classical esti- 
mates of reliability of measurement of in- 
formation comprehended will decrease. 


SUMMARY 


. Ss were permitted to read complex 
m les approximately 700 words in length. 
aeri group, questions about the stories 
H v inserted behind each paragraph, in 
gie group the questions appeared be- 
ree ys Story, in a third group the ques- 
bo: ollowed three stories and in the 

group the questions followed six 


stories. It was found that the Ss' variance 
increased in a negatively accelerated fash- 
ion with inereasing amount of materials 
being presented prior to testing. However, 
the error variance also increased but in a 
linear or positively accelerated fashion. 
This relationship resulted in the reliability 
estimate of the scores increasing through 
the first three conditions and then de- 


creasing. 


REFERENCES 


Butrincx, T. Mythology. New York: J. M. 
Vent, 1931. 

Wess, W. B., & Watton, E. J. Comprehen- 
sion by reading versus hearing. J. appl. 
Psychol., 1956, 40, 237-240. 


Received August 12, 1958. 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 2, 1959 


PERSONALITY SYNDROMES AND ACADEMIC ACHIEVEMENT 
GEORGE MIDDLETON, JR? AND GEORGE M. GUTHRIE 


Pennsylvania State University 


Attempts to predict academic achieve- 
ment have utilized three classes of vari- 
ables: intelligence test scores, indices of 
previous achievement, and measures of 
personality and other nonintellective fac- 
tors. To date, aptitude test scores combined 
with an index of high school performance 
in a multiple regression equation have 
yielded the best estimates of the grade 
point average. A ceiling of about 70 has 
been reached in these efforts, however, with 
most Rs in the 50’s. Attempts to improve 
prediction by using nonintellective fac- 
tors, such as interest and personality 
traits, have yielded quite discouraging re- 
sults. The principal sources of difficulty 
appear to lie in the heterogeneity of the 
criterion and the nonsummative and non- 
linear properties of many promising pre- 
dietors. 

It is worthwhile to examine the most 
frequently used eriteria, the grade point 
average, and pass-fail. While they provide 
a convenient continuum or dichotomy, they 
are reached by subjects by many different 
Toutes. Ghiselli (1956) has diseussed the 
parallel problem that arises from the use of 
one over-all numerical rating as a criterion 
of performance in an industrial setting. 
Averaging grades frequently entails aver- 
aging uncorrelated values. One behavior 
pattern, such as conformity, may be re- 
warded in one course and another of inde- 
pendence in a different course. The availa- 
bility and the social significance of the 
averages on transcripts have encouraged 
persons doing prediction studies to over- 
look the heterogeneous origins of the values 
which contribute to these eriterion Scores, 


*Now at McNeese State College, Lake 
Charles, Louisiana. 


Even if we are predicting achievement 
in a homogeneous subject matter area, OT 
for a single course, we must take into ac- 
count the several sets of motivational struc 
tures which further or impede achievement- 
Worrying, for example, may facilitate one 
Student's work and cause another to fail. 
Some light on this can come from the ber: 
systematized observations of persons wit 
similar achievement but different abilities 
and of those with similar aptitude scores 
and very divergent performance. It is cleat 
that there are many patterns of combining 
aptitudes, methods of studying, and mo 
vation to achieve a similar grade point - 
average. This has led some clinicians to 1€ 
sort to nonmathematical combining of p 
dictors in an effort to reach a more satisfy- 
ing global assessment of the individual s 
likelihood of success. For all that this 
method may provide more convincing 
after-the-fact explanations of success ° 
failure, Meehl's summary of statistical 2 * 
clinical prediction (1954) shows that oa 
bining indices “in our heads” does not yl e 
as good results as even relatively simP 
Statistical methods. And neither do Ve 
well! " 

In a break with traditional desi 
Frederiksen and Melville (1954) were ab i 
to demonstrate thas correlations of a PT e 
dictor variable may be increased in i 
cases by including a moderator variable 
variable which bears no relationship to * 
criterion. Specifically, they demonstrat" 
that taking out compulsive Ss will impro 
prediction of the other Ss, even thou? 
compulsiveness is not correlated with ei 
the predictor or the criterion, Saund?! 
(1956) has presented a mathematical b25, 
of moderated regression and further ? 


PERSONALITY AND ACADEMIC ACHIEVEMENT 67 


Par of its operation. This work suggests 
that personality factors may be associated 
mee achievement but not necessarily in a 
ut or curvilinear relationship. Nor is 
R relationship that of the usual suppres- 
^. oda Rather, there may be at least 
ees group; and probably more than one, 
(ccena by certain personality char- 
M in which the relationship be- 
FA ^n intelligence and achievement is dif- 
ent from that among those not in the 
subgroup. 
mae of obtaining groups of similar 
rmi dim s has been demonstrated by 
ie pem (1953, pp. 190-218). By in- 
bru c analysis he separated high and 
study Ro on a series of items regarding 
Stern m its. This approach is discussed by 
ffir] ein and Bloom who provided the 
criteria * for this study. Since the same 
OL wüys ced can be achieved in a variety 
be ina ; the multiple regression model may 
"Een i ig This study utilizes their 
ealing = om ‘A possible technique for 
Rate s th non-homogeneous populations 
ysis” ek by transposed factor anal- 
n this es Stein, & Bloom, 1956, p. 235). 
omogen itc it may be possible to identify 
Personality s subgroups with different 
e fact y Syndromes and then to seek 
acade tors leading to different levels of 
mic achievement. 


MzrTHOD 


A 

the pss has been made to delineate 
achieving nality syndromes among high 
Wiles and low achieving students. The 
lum Gee students in a single curricu- 
Management” pool of about 50 business 
at least juni students who had attained 
of 259° d Ti Standing, 14 with averages 
With: ayo gher where A is 400, and 14 
ese gro ages below 2.00 were selected. 
Schoo] Tet: th Bad done equally well in high 
er a high group had significantly 

e Ss w S on a college aptitude test. 
ere given a 300-item personality 


questionnaire, Table A, made up of items 
drawn from Murray (1938, pp. 142-242). 
These items had been designed to measure 
18 of the needs in Murray's system. Each 
item was answered true or false. Examples 
of the items are: 


I enjoy psychological novels more than 

other kinds of literature. x 
I became very attached to my friends: 
I am intolerant of people who bore me. 


Adjusted phi coefficients were computed 
between each pair of Ss. The 14 X 14 
matrices for high and low achievers were 
factor analyzed, and the rotated factor 
loadings were correlated with the scores on 
each of Murray's 18 provisional scales to 
permit interpretation of the factors. 


RESULTS 


The results are shown in Table 1 and in 
Tables B through K^ Five factors were 
extracted from the matrix for high 
achievers. Ten of 14 Ss showed loadings of 
at least .40 on one or more factors. "These 
factors were interpreted by referring to 
their correlations with the 18 provisional 
scales. The factors are described in terms 
of the needs in Murray's system which cor- 
related at least 40 with the factor. 

Factor H-I. This factor correlates posi- 
tively with nurturance and dominance and 
negatively with abasement, succorance and 
narcissism. Achievement for these Ss ap- 
pears to mean power and approval. 

Factor H-II. Autonomy, aggression, 
counteraction, and achievement correlate 
positively, while abasement and affiliation 
are negatively correlated. Achievement for 


? Tables A-K have been deposited with the 
American Documentation Institute. Order 
Document No. 5854, remitting $2.00 for 35 
mm microfilm or $3.75 for 6 by 8 in. photo- 
copies. Order from ‘ADI Auxiliary Publica- 
tion Project, Photoduplication Service, Li- 
brary of Congress, Wash. 25, D.C. Make 
checks payable to Chief, Photoduplication 
Service, Library of Congress. 


68 GEORGE MIDDLETON, JR. AND GEORGE M. GUTHRIE 


TABLE 1 


LATIONS BETWEEN RoTATED Factors AND PROVISIONAL SCALES FOR 
icon HicH AND Low ACHIEVERS 


High Achievers Low Achievers 
Ren rdimzimisl|vi'rszlinmlimsl!sr 
06 
.25 | —.13 | .47 o9 | —.46| .60| .17 | —.61 
aia —.17| .51|—.03| —.28| .43| .36 —.22| —.35 be 
Beer —.04 | —.41 | .63 | —.26 | —.23 | .83| —.10 | — o 
n-Exh -47|-.45| -78 | —.05 | —.35 | .47 —.21 | — 4 105 
Extra .10 .27 | —.31 49 | —.05 | —.34 45 4 “43 
n-Aff 15) —.45| .53 36| —578| .15| .41 is "20 
n-rej .23 .32 | —.15 | —.09 15 20 | —.43 3 130 
N —.52| .00| .22| —.33 31 64 | —.38 | —.68 | ^0 
n-Sue —.52 | —.31 | .80 00 | —.42| .82| —.68 | —.55 "1 
n-Nur .52 | —.40| .22 52 | —.78| .06| .52| —.17 m 
n-Inf —.95| —.23] .97 50 | —.52 | .75 | —.42 | —.34 29 
n-Ctn -34 .46 | —.29 32 05 | —.96 .88 31 52 
n-Und 23 | —.23 | —.11 15 | —.49 | —.10 E) 06 "p 
n-Dom 42 | —.17 .21 41 | —.37 10 .10 | —.10 06 
n-Auto —.13 .54 .08 | —.29 20 .24 | —.08 | —.71 Z "06 
n-Def 09 | —.36 | .45 55| —.76| .58| —.18 | —.36 ~ "58 
n-Aba —.89 | —.59| 64 00 | —.11 .63 | —.55 | —.45 790 
n-Ach .15 .43 .04 07 | —.09 .16 | —.08 844 —* 


^ The needs are identified as in Murray (1938). 


this group seems to be an expression of re- 
sentment and independence. 

Factor H-IIT. The needs of succorance, 
exhibition, abasement, sentience, and affili- 
ation all correlated positively with this fac- 
tor. In contrast to the preceding factor, 
this factor presents strong dependence. 

Factor H-IV. Deference, nurturance, in- 
favoidance or avoidance of failure, extra- 
ception, and dominance correlated posi- 
tively with Factor H-IV. These Ss appear 
to be pursuing goals of social prestige and 
influence. Achievement may be an avenue 
whereby they can be thought well of. 

Factor H-V. Aggression shows a low 
positive correlation, while high negative 
correlations appear with nurturance, affilia- 
tion, deference, infavoidance, and under- 
standing. Achievement appears to be re- 

lated to a hostile aggressive denial of tender 
socialized feelings. 
These five factors must be regarded as 


more illustrative than confirmed. However 
the results do suggest, that achievement es 
high grades may be motivated by wn 
for power, resentment, dependence, poa gs 
acceptance, and aggression. These findin 


D 
are supported by anectodotal accounts 
counselors. 


e 
Four factors were extracted from i: 
matrix of low achievers. Eleven Ss a 
correlations of at least .40 with one or m 
factors. d 
Factor L-I. The following needs uim 
high correlations with this factor: yit 
tience, succorance, infavoidance, et 
sism, and abasement, while countera? 
or a need to overcome defeat was his 0 
negatively correlated. These Ss appe?" 
be preoceupied with pleasures. din 
Factor L-TI, Nurturance, understant ely 
extraception and affiliation are postu 
correlated with this need, while succor? re 
abasement, rejection, and infayoidanc® 


PERSONALITY AND ACADEMIC ACHIEVEMENT 69 


denied. These persons appear to be insist- 
ently extroverted in their relationships. 
Factor L-III. Extraception is the only 
positively correlated need, while sentience, 
autonomy, narcissism, exhibitionism, and 
ecg are negatively related to this 
actor. This group seems intent on dis- 
avowing social shortcomings. 
oe L-IV. Needs of exhibition, domi- 
ns, understanding, and affiliation are 
Fg related to this factor, while the 
a to avoid blame and threats to his 
e ve These Ss appear to be pre- 
ha P with power and acceptance. The 
Sided edens factors reflect, trends 
tg pleasure seeking, extroversion, de- 
Racks etic shortcomings, and power. 
oi. por make sense, after the fact, in 
he ~ o are doing poorly in school. 
ieee a vum point about these results 
but ie firmation of specific syndromes, 
hians. the demonstration that these 
in. ent factors do exist. Their con- 
seeking : would point to the next step of 
ot the gn predictors for each group 
dictions ud these factors to qualify pre- 
failing MEM this, remedial work with 
clarification ents may be improved by a 
n of the systems of attitudes 
Ppear frequently among them. 


SUMMARY AND CONCLUSIONS 


Fòu , 
io a high achieving and 14 low 
Persona Students answered a 300-item 

ality questionnaire. The matrices of 


phi coefficients between pairs of persons 
were factor analysed. Five factors of per- 
sons were found in the matrix of high 
students and four factors were found in the 
other matrix. Although the factors need 
much further confirmation, they are highly 
suggestive. The method employed will per- 
mit identification of homogeneous groups 
of students for whom academic achieve- 
ment may play a similar role. Improved 
prediction and modified counseling tech- 
niques should be made possible by clearer 
recognition of different syndromes of per- 
sonality which are associated with different 
levels of achievement. 


REFERENCES 


FREDERIKSEN, N., & MELVILLE, S. D. Differen- 
tial predictability in the use of test 
scores. Educ. psychol. Measmt., 1954, 
14, 647-656. 

Guusett, E. E. Dimensional problems of cri- 
teria. J. appl. Psychol, 1956, 40, 1-4. 

Meerut, P. E. Clinical versus statistical pre- 
diction. Minneapolis: Univer. Minne- 
sota Press, 1954. 

Murray, H. A. Explorations in personality. 
Oxford Univer. Press, 1938. 

Saunpers, D. R. Moderator variables in pre- 
diction. Educ. psychol. Measmt., 1956, 
16, 209-222. 

Srepuenson, W. The study of behavior. 
Univer. Chicago Press, 1953. 

Stern, G. G., Smer, M. L, & Broow, B. S. 
Methods in personality assessment. 
Glencoe: Free Press, 1956. 


Received August 28, 1958. 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 2, 1959 


IANCE 
ING PRINCIPALS' RATINGS OF TEACHER PERFORMANC 
rus udi FROM PERSONALITY DATA 


ROBERT F. PECK 


University of Texas 


"Teacher personality is known to influence 
teaching effectiveness (Barr, 1955; Hunt, 
1956; Ryans, 1952; Symonds, 1955). In an 
effort to analyze this relationship further, 
a pilot study was undertaken using projec- 
tive personality data obtained from 49 
teachers, from five school systems, who had 
been nominated by their principals as high, 
average, or low on five aspects of teacher 
behavior (Stiles, 1957). As an exploratory 
measure of the relationship between per- 
sonality and teaching performance, an at- 
tempt was made to predict the principals’ 
ratings from independent analysis of the 
personality data on the teachers, 

To venture to predict the evaluation that 
will be given to different teachers, in differ- 
ent school systems, by different principals 
whose individual attitudes and expectations 
are not known, might well be regarded as 
à venture in foolhardiness. To make such 
an attempt does require the assumption 
that there are some fairly universal stand- 
ards for judging teaching effectiveness, de- 
spite the many situational factors that 
weigh somewhat differently with each 
school and each community (Ryans, 1957). 
Another assumption is also implied: that 
school principals, whatever their differences 
of opinion, share enough values in common 
in appraising teachers, and are stable 
enough in their judgments, that their eval- 
uations can be predicted with reasonable 
accuracy if the individual personalities of 
the teachers are known. Finally, of course, 
it is a considerable assumption to make 
that the personalities of teachers can val- 

idly be assessed from projective data. Such 
a study was undertaken, nonetheless, as an 
exploratory test of both concepts and in- 


70 


struments which might prove useful in à 
five-year research program now getting un- 
der way. 


PROCEDURE 


The Teacher Efjectiveness Scale 


In order to measure several aspects of 
the teaching role which presumably are 
somewhat independent of one another, à 
five-part criterion instrument was con- 
structed. School principals were asked to 
nominate a teacher who was high, one who 
was average, and one who was low on each 
of these five scales: 

I. Organizing and communicating infor- 
mation and skills, i 

II. Creating a healthy relationship with 
pupils. 

III. Creating good relations with othe? 
teachers, 

IV. Building good relations in the com- 
munity. $ 

V. Supervisor personal  evaluatio? 
(“Who would you pick to take with you} 
you moved to a new school ?"). uh 

A descriptive paragraph was given pa” 
each scale, to provide some concrete illus 


z : P e 
trations of the intended meaning of dà 
scale. 


"This study was part of a research pr 
gram of the Laboratory of Human Behavio 
Department of Educational Psychology; by 
versity of Texas—a program. financed ] 
funds from the Hogg Foundation for Me? " 
Hygiene. Relationships among college en 
riences, individual mental health, and tea”, 
ing effectiveness are currently under stu tef 
the Mental Health Demonstration Cen ne 
University of Texas, under a grant from 
National Institute of Mental Health. 


RATINGS OF TEACHERS FROM PERSONALITY DATA ph 


The Projective Instruments 


The teachers nominated by the princi- 
pals were asked to complete two forms and 
return them directly to the research office: 
ee Biographical Information form; 
ma eean Sentence Completion, Form 
T a 90-item instrument designed for use 

h educational personnel and with college 
Students of education. | 


Steps in Data Collection 


am elementary schools were con- 
iu invi y letter. and then by personal call, 
PUO d participation in the study. Ten 
Vide uw agreed to take part, and pro- 
— eacher Effectiveness nominations. 
the Sr n was taken to insure that 
the prin ipals' ratings were known only to 
and his pue and to the research director 
effort T atistical assistant. A systematic 
Pendent s made to obtain a second, inde- 
Sünde a in each school, such as a 
to be ii Supervisor; but this turned out 
Each easible. 
of the Eee ae principal gave a set 
orms Dew and sentence completion 
Nominated ach of the teachers he or she 
Ness scales on any of the Teacher Effective- 
Ute this inf with the invitation to contrib- 
attitudes o mation to a study of teachers’ 
BE part ribs interests. Some indication 
May be fo pation was generally voluntary 
Schools und in the fact that of the 10 
both prine uPating, complete data from 
from P. and teachers were obtained 
from these a Schools. Forty-nine teachers 
Nal reges v 2i schools thus constitute the 
e PAmple. 
On *. Cities from which teachers pro- 
Credit i 
quite, 2 te te Ralph Duke, Carson Mc- 
Phase” White des ae John Newell and 
the Boe the study. penable help in this 
Only paticipating etd ieularly. Credit due 
t pals and teachers can 


are p C EXpre; 
$ Protected d in general, so that identities 


vided data ranged in size from 5,000 
through 15,000 and 30,000 to 500,000. This 
gave a wide range and diversity in size of 
school system and in cultural milieu. Geo- 
graphically, the cities spanned the state 
from end to end, further diversifying the 
cultural contexts of the schools. One was a 
rural seat, one a heavily industrialized 
town, and one a highly metropolitan city. 

The 49 teachers ranged in age from 22 to 
70, with a median age of 42: 10 were 30 
years of age; 12 were in their 30's, 15 were 
in the 40's; 11 were in the 50's; and one was 
70; all but two were women. In experience, 
they ranged from one to 50 years of teach- 
ing, with a median experience of 14 years. 
The great majority were married. Some had 
taught continuously since the beginning; 
some had entered teaching after children 
reached school age; and some had taught, 
withdrawn to marry, then resumed teach- 
ing 6, 8, or even 20 years later. Considering 
the high attrition rate in sample size, it is 
as much good luck as planning that the 
final sample was quite heterogeneous. 

It should be noted that there was no 
significant relationship in any of these five 
subsamples between age or experience and 
the effectiveness ratings. 


Analysis of the Projective Data 


The biographical and sentence comple- 
tion forms were coded for school and indi- 
vidual, then separated into five groups ac- 
cording to whichever of the five Teacher 
Effectiveness scales each teacher had been 
nominated on. This gave subsamples as 
follows: Scale I: 13 Ss; II: 11; HII: 11; 
IV: 13; V: 10. 

A “blind,” qualitative analysis was made 
of each combined pair of Biographical and 
Sentence Completion protocols, and brief 
notes were recorded on each teacher's per- 


sonality characteristics. (See Peck & 


Thompson [1954] for analogous procedure 
in another set of studies of personality 


72 ROBERT F. PECK 


and job performance.) Following this, a 
comparison was made of the people within 
the subsample for the scale under investi- 
gation, and an effort was made to classify 
each teacher as high, average, or low on 
that scale. There was some confusion fre- 
quently as to whether this rating should 
represent the judge’s own appraisal of the 
person, against the scale as the judge visu- 
alized it, or whether it should represent an 
effort to match the principal’s probable 
rating, regardless of the judge’s personal 
opinion. A compromise was generally at- 
tempted, leaning toward an effort to match 
the principal’s supposed rating. In some 
cases, however, particularly on Scale I, 
the judge found there was a limit beyond 
which he felt impelled to record his own 
evaluation, although he suspected that the 
principal’s rating wound differ. The groups 
were analyzed and rated in this order: I, 
IH, IV, II, V, I. 

A product-moment correlation was sepa- 
rately computed for each group. For this 
purpose, the “principal’s” rating was 
treated as a score (eg. “High” = 1 


? 


“Low” = 3), and the ratings from the 
projective data were similarly treated. Ns 
were small, ranging in size from 10 to 13 
Ss in the various scale-groups. No relia- 
bility measure was attempted, since no 
second judge appropriately trained in the 
use of these particular instruments was free 
to participate in the study, at the time, At 
most, therefore, positive findings could 
only serve as encouragement to more ex- 
tensive research, with more specific delinea- 
tion of the analytic procedures, 


RzsurrTs 


The correlation of principals’ ratings 
with projective-based ratings, for each 
scale, was as follows: 

I. Organizing and communicating in- 
formation and skills. The original correla- 
tion between the projective analyst’s rating 


and the five principals’ ratings was —.42° 
a nonsignificant relationship. 

II. Creating a healthy relationship with 
pupils. The correlation here was .19, in- 
significant. 

III. Creating good relations with other 
teachers. The projective analyst's ratings 
correlated .82 with the principals' ratings, 
a relationship significant beyond the .01 
level. 

IV. Building good relations in the com- 
munity. The correlation was .89, significant 
yond the .01 level. 

V. Supervisor's personal evaluation. The 
correlation of the projective-based rating 
with the principals’ was .84, significant be- 
yond the .01 level. 


Discussion 


Knowing nothing about the principals, 
the schools or the specific communities 
involved, the analyst was able to “predict” 
the ratings of five different principals on 
three of the Teacher Effectiveness scales, 
with an almost surprisingly high degree of 
accuracy. Yet, curiously, the same analyst, 
when appraising for classroom teaching 
ability and for effect on pupil’s mental 
health, disagreed with the principals (and 
about equally with all five). This diserep- 
ancy calls for examination. 

The first explanation, of course, might 
be that the analyst is a poor judge of 
classroom teaching, and of mental health. 
Undoubtedly this is true to some extent, al- 
though there is a certain amount of evi- 


"Upon rerating this group of teachers at 


fhe end pv, study, the analyst knew that 
is original judgment was almost ite i 
direction to the pri fis iod dete 


erately to force hi € so he tried delib- 
x orce himself to ia 
cipals would. On rate as the prin: 


Several ca. * 
concluded it would do vi es, however, b 


teacher. The fi 
cipals’ ratings 
lationship, 


RATINGS OF TEACHERS FROM PERSONALITY DATA 73 


dence to the contrary (Peck & Thompson, 
1954; Peck & Parsons, 1956). Apparently 
the analyst ean understand and match 
school principals’ ratings with considerable 
accuracy, when it comes to judging 
teachers’ effectiveness in relating to other 
teachers, to the community, and to the 
principal. 

A second interpretation might be that 
these particular kinds of data permit pre- 
diction of the latter three kinds of behavior, 
but not of the first two. This seems quite 
unlikely, however, in view of the analytic 
Process that was followed. An over-all 
picture of each teacher's personality pat- 
tern was built up from the data by the 
analyst. Then, from this picture were in- 
ferred the various kinds of behavior to be 
rated. There is no immediately visible rea- 
son why it should prove possible to infer 
complex relations with an (unknown) com- 
munity accurately, yet impossible to infer 
relationships with children, in the much 
better known setting of the classroom. 

Another possible source of error, ac- 
counting for the lack of agreement on 
Seales I and II, has to do with possible in- 
completeness or inaccuracy of the princi- 
pals’ knowledge of what goes on inside the 
classrooms. Where the principals have fre- 
quent opportunities to observe teachers 
at first hand—as in the situations Scales 
IIL, IV, and V represent—they may achieve 
an accurate picture of the teachers’ relative 
efficiencies in these settings. When the prin- 
cipals have relatively little opportunity 
to see the teachers’ natural behavior, as is 
actually the case with what happens inside 
the classroom in many schools, then the 
principals may not really have an adequate 
basis for judging how effective particular 
teachers actually are, in classroom action. 

Indeed, some indications in the data sug- 
gest that the principals may unconsciously 
be influenced by factors quite apart from 
the teachers’ classroom efficiency. Several 
of those rated high on Scales I and II by 


the principals, give evidence in the pro- 
jective data of a mild but pervasive neur- 
asthenia, combined with a quietly firm 
self-restraint amounting almost to self- 
effacement. These women did not seem to 
the analyst to have either the emotional 
aliveness or the sheer energy to be able to 
attend alertly and respond effectively to 
either the intellectual or emotional states 
and changes of individual children. These 
teachers have an organized way of keeping 
things quiet, however; and perhaps this is 
what marks them in some principals’ eyes, 
rather understandably, as good teachers: 
they keep quiet, keep out of the way, de- 
mand little attention, and meanwhile keep 
the children from “getting in the princi- 
pal’s hair." 

This may not be either a true or à fair 
interpretation in the present instance (al- 
though some of the accurate matching of 
the principals’ ratings on the other scales 
proceeded from somewhat similar reason- 
ing). It is important to raise for detailed 
inquiry, however, the question of whether 
efficient eustodians are not sometimes mis- 
taken for effective teachers by principals 
too harried to note the distinction. And, 
of course, there may be principals who ac- 
tively prefer and seek to have quietly sub- 
missive teachers, a possibility which would 
raise some question about the mental 
health and emotional climate of those 
schools. 

When one considers the room for error 
at any one of the several stages of the 
study, let alone all of them together, per- 
haps the wonder is that any degree of ac- 
curacy of prediction could be achieved. As 
is always true of the half-science, half-art 
of human assessment, it requires à complex 
series of “educated guesses” —requires it, 
because the number, nature and interaction 
of all the factors that affect such a thing as 
teaching effectiveness are far too intricate 
for adequate representation by a few 
“pure” measures of single variables or by 


74 ROBERT F. PECK 


even the best mathematieal models avail- 
able at present. 


Summary AND CONCLUSIONS 


Forty-nine experienced elementary 
School teachers from five geographically 
separated cities in Texas were rated by 
their principals on a Teacher Effectiveness 
instrument. À given teacher was nominated 
as being high, average, or low on one of 
five scales: I. Organizing and communicat- 
ing information and skills; IT. Creating a 
healthy relationship with pupils; IIT. Cre- 
ating good relations with other teachers; 
IV. Building good relations in the commu- 
nity; V. Supervisor's personal evaluation. 

A projective personality analysis was 
made of each teacher, on the basis of a bio- 
graphical form and a sentence completion 
form. From this, an effort was made to 
match the criterion ratings. Correlations of 
the projective ratings with principals’ rat- 
ings were: Scale I, 42; II, .19; IIT, .82; 
IV, .89; V, .84. Highly significant predic- 
tion was achieved on the last three scales. 
Possible reasons for the disagreements on 
intraclassroom effectiveness were discussed. 
The positive results suggest that there may 
be some fairly universal standards for 
judging teacher effectiveness; that various 
principals’ judgments are stable and valid 
enough, at least where the principals haye 
a chance for first-hand observation, to be 


predictable; and that personality can be 
validly appraised from projective data 
(Alexander, 1950). The results appear fa- 
vorable to further experimentation with the 
kind of data and projective analysis em- 
ployed in this study. 


REFERENCES 


ALEXANDER, T. Prediction of teacher-pupil 
interaction with a projective test. J . clin. 
Psychol., 1950, 6, 273-276. 

Bann, A. S. Measurement and prediction of 
teacher efficiency. Rev. educ. Res., 1955, 
25, 261-269. 

Hunt, J. T. School personnel and mental 
health. Rev. educ. Res., 1956, 26, 502- 
521. 

Pec, R. F., & Parsons, J. Personality and 
work output. Personnel Psychol., 1956, 
9, 49-79. 

Pzck, R. F, & THompson, J. The use of in- 
dividual assessments in a management 
development program. J. personnel Ad- 
min. industr. Relat., 1954, 1, 79-98. 

Ryans, D. G. A Study of criterion data (A 
factor analysis of teacher behaviors in 
elementary school). Educ. psychol. 
Measmt., 1952, 12, 333-344. 

Ryans, D. G. Notes on the criterion problem 
In research, with special reference to the 
study of teacher characteristics, J. genet. 
Psychol, 1957, 91, 33-61. 

Stes, L. J. The teacher's role in American 
society. New York: Harper, 1957, 
Synonps, P. M. Characteristics of the effec- 
tive teacher based on pupil evaluations. 

J. exp. Educ., 1955, 23, 289-310, 


Received August 28, 1958. 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 2, 1959 


THE RELATIONSHIP OF HOME AND SCHOOL EXPERIENCES 
TO SCORES ON ACHIEVEMENT TESTS: 


JOHN W. FRENCH 
Educational Testing Service 


' In connection with college entrance, an 
important part is played by standard va- 
lidity studies designed to find out how well 
aptitude or achievement tests are able to 
predict academic grades. Through such 
studies much can be learned about the 
necessary qualifications for college work, 
but it is also of interest to know what back- 
ground situations and what experiences af- 
fect the test scores. Such information may 
have some implications for college prepa- 
ration, but, more importantly, it can im- 
prove an understanding of the ways in 
which students acquire the knowledges and 
skills that are being measured. 

The tests used in this experiment were 
developed to provide information on a 
Student's preparation for college but with 
increased emphasis on ability to apply 
knowledge of principles to solutions of 
problems and decreased emphasis on 
knowledge of specific details. There are six 
tests in all: Science Glossary which con- 
cerns scientific language and concepts; 
Science Abilities which concerns the solu- 
tion of scientific problems; Social Studies 
Abilities calling for understandings in the 
field of history; Social Studies Essay using 
the essay approach to the same thing; 
Humanities References covering familiarity 
with ideas in the fields of literature, music, 
and art; and Humanities Abilities covering 
an understanding of literature, music, and 
art. 

The tests were administered to high 
school seniors at 26 private and 15 public 
schools. To permit comparisons between 


1The study was supported by the College 
Entrance Examination Board in connection 
with the tryout of a series of experimental 
tests entitled Tests of Developed Ability. 


75 


the experimental tests and a standard test, 
only those students who had taken the 
College Board’s Scholastic Aptitude Test 
were included in the study. The students 
also filled out a questionnaire on family 
background, courses taken, courses liked, 
hobbies, reading habits, and their own opin- 
ion of how tests should rate their ability 
or achievement in various fields. The 
schools supplied grades and special ratings 
by teachers. Complete data were obtained 
for 1,275 boys and 725 girls. 


Preliminary Screening Analysis 


Because the variables were quite nu- 
merous, it was desirable to eliminate from 
further study those that had no important 
relationships. For each of the two or more 
categories defined by each variable, the 
number of Ss making low, middle, and 
high test scores were tabulated by the 
IBM 101 machine. To avoid capitalization 
on chance all tabulations were made sepa- 
rately for two random halves of the data, 
referred to later as Sample 1 and Sample 2. 

Inspection of the tables gave no evidence 
of curvilinearity and revealed no important 
diserepancies between the two samples. The 
tables were then used for the elimination 
from further study of questionnaire, rating, 
or experience variables having low relation- 
ships with any of the test scores. Unex- 
pectedly, father's occupation and education 
showed little or no relationship to the test 
scores, and were, therefore, eliminated. 
Hobbies in music, art, mechanics, and 
sports were also left out, as were the num- 
ber of books read in mechanics and in 
sports and the number of courses taken in 
art, music, religion, home economics, busi- 
ness, and vocations. Variations in the num- 


76 JOHN W. FRENCH 


ber of courses taken in English would un- 
doubtedly affect some of the test scores, 
but this was omitted as a variable, be- 
cause almost all Ss took four years of high 
school English. A liking expressed for some 
courses was found to be important, but the 
variables pertaining to a liking for art, 
musie, religion, home economics, business, 
and vocational courses were eliminated. 


Having taken physies and having taken 
chemistry are two variables which did show 
high relationships with the science tests but 
were omitted from further study, since 
it was thought that the over-all number of 
Science courses taken would represent 
physics and chemistry well enough. It is 
notable that neither Biology nor General 
Science courses was substantially related 


TABLE 1 


CORRELATIONS or EXPERIMENTAL TESTS AND SAT Scores WiTH 
RATING AND EXPERIENCE VARIABLES FoR Boys 


Variables cios. | Abii. | Ài | 8S) Rar | Bap: [sarv sara 

Tests SAT-V -56 | .69 | .79| .49| .75| .69 71 
SAT-M 61 T .62 .38 53 .55 vtl 

School High School Decile -46 | .48| .46| .35| .43| .39| .48 .51 

Ratings | Science Grades -58 | .52 | .40| .24| .25 | .29| .39| .45 

Social Stud. Grades .25 .32 .54 | .35 .42 .33 .46 .93 

Humanities Grades -43 | .38 | .44| .30| .45| .41| .48 41 

Science Rating 60} .58 | .39 | .25 | .30| .34| .42 .46 

Social Stud. Rating .20 31 54 42 | .49 42 45 30 

English Rating .33 43 48 40 -56 52 66 45 

Self Rat- Science Estimate 65 59 30 15} .17 28 33 | .44 

ings Math. Estimate .52 58 35 14| .23 32 35| .71 

Social Stud. Estimate | .00 07 50 29] .35 20 33] .06 

English Estimate .23 29 42| .32| .48 44 55 | .27 

Art Estimate .08 08 06 .01 .15 17 11 |-.04 

Musie Estimate .06 07 02 | .03| .16 23 07 |-.02 

Language Estimate .18 19 27 26 | .42 37 40 32 

Experience | Science Hobby .99 | .39 12 |—.01 | .06 13 12 20 

Variables| Social Hobby 17 19 14 15 | .24 13 18 23 

Writing Hobby ll 18 39 27 .43 38 4l 23 

Science Read 47 38 18 04| .13 12 24 23 

Mechanics Read .19 14 |—.06 |-.10 |-.09 |-.08 |— 07| .07 

History Read .02| .04| .28| .16| .24 11 -16 | .13 

Government Read —.10|-.05| .24| .13]| .24 10| .19 | .02 

Travel Read .00 |—.02 10 |—.03 05 04| .03 |—.15 

Fiction Read :13 15 32 .18 .32 30 35 .20 

Musie or Art Read .01 05 10| .03| .29 14| .15 03 

Science Courses .23 07 |-.13 |-.10 |—.14 |-..16 —.10| .08 

Math. Courses -38 | .39 | .01 |-.01 -00 | .05 05 .40 

pecs) Bud, Courses |-.3i |--.27 | 15 0 | om 

Language Courses Ai 17 34 16 | .45 48 46 35 

Science Liked -39 | .33| .02|-.02 —.02 | ‘og 06 | .08 

Math. Liked 82) 37| .04| 06] .03| 30! 02 42 

Poe Stud. Liked — 30 |-| XS “as | uw] Gal | 

English Liked 7:07 |~.05 | .18| .26 | 28 | :22 | ‘os —.08 

Language Liked —-05 |—.02] .10| .09 -33 | .25| .20| .16 


Note.—The figures are averaged tetrachoric correlations for Samples 1 and 2. 


= 


We 


EXPERIENCES EFFECTING ACHIEVEMENT TESTS 77 


to performance on the science tests. A 
count of the test items from the various 
fields of science does not explain this find- 
ing. 

In the field of social studies no course 
was found to be substantially related to 
the tests. Variability in taking American 
history would presumably have shown a 


relationship to the social studies tests, but 
almost all students had taken the course. 


Results for Selected Variables 


This paper will be mainly concerned with 
what we can call experience variables: 
hobbies, reading, courses taken, and courses 
liked. Courses liked are considered to be ex- 


TABLE 2 


CORRELATIONS or EXPERIMENTAL Tests AND SAT Scores wrrH RATING AND 
EXPERIENCE VARIABLES FOR GIRLS 


"RENI | SK | SK | SS, | Bae | Har |sav sarar 

Tests SAT-V .48 .68 | .81 .49 | .74| .78 .57 
SAT-M E etl 51 37 E .44 | .57 

School High School Decile .92| .49| .54| .29| 44] .41| .59]| .55 

Ratings | Science Grades "36 | .34| .34| .25| .30| .24| .35| .44 

Social Stud. Grades “a4 | .43| .45| .22| .36| .40 | .52| .46 

Humanities Grades :32| .48| .50| .27| .45| -48| -65) .39 

Science Rating ‘51 | .56| .55| .38| .36| .36| .55| .55 

Social Stud. Rating ‘29 | .40| .60| .30| .45| .41] .61| .44 

English Rating .97 .48| .62| .32| .57| .55| .68 .48 

Self Rat- | Science Estimate 43 | .41| .15| .20| .12] .12| .31| .35 

ings Math. Estimate "35 | .54| .25| .20| .14| .19| .258  .72 

Social Stud. Estimate | .03 .16 43 .29 | .32 | .21 Al 04 

English Estimate +24 .32 .48| .23]| .47 AT .66 | .28 

Art Estimate .06 |— .06 .01 |-.04 .15 | .16| .08 —.23 

Music Estimate —.11 |—.12 |-.08 |-.06 | .15 |-.01 |-.02 —.03 

Language Estimate .15 16 37 24| .34 17 38 | .26 

Experience | Science Hobby .22 24 15| .10| .07 09 12 01 

Variables| Social Hobby .06 17 18 mil .09 13 .19 25 

Writing Hobby .15 18 31 .19 .85 27 AL 18 

Science Read +25 12 |-.16 |—.10 |-.09 |—.04 |—.07 02 

Mechanics Read .16 03 09 .04 .02 15 | .08 —.02 

History Read .14 05 18 | .02| .25 22 | .20 05 

Government Read .01 07 19 .02 .23 21 .25 11 

Travel Read al 01 17 .10 .27 11 .21 08 

Fiction Read .08 14 25 15| .29 24 | .35 18 

Musie or Art Read 12 12 20 07 | .30 25 | .25 08 

Science Courses 44 35 07 |-.08 | .00 07 | .07 10 

Math. Courses .07 .15 |—.08 |- .19 .06 05 | .04 36 

Social Stud. Courses |—.11 |—.09 | .03 |—.09 | .03 |-.08 .00 |—.18 

Language Courses ar| .07| .22] .17| .20| .16| -24| 2 

Science Liked .33 15 |—.08 |-.09 |-.09 |-.05 |—.08 |- .06 

Math. Liked .25 31 .08 .02 |-.14 |-.08 |—.04 52 

Social Stud, Liked —.01 |—.04 .30 .23 .21 .14 14 |—.12 

English Liked .10 14 25 09 .25 36 .37 .06 

Language Liked 12 |—.01 21 07 .19 06 17 04 


Note.—The figures are averaged tetrachorie correlations for Samples 1 and 2. 


78 


perienee variables as they indicate very 
important qualitative aspects within a 
student’s school experience. As mentioned 
earlier, education and occupation of the 
father, which are also experience variables, 
were omitted, because they were found to 
bear only a slight relation to test scores as + 
. compared to the other variables just men- 
tioned. Table 1 for Boys and Table 2 for 
Girls give tetrachoric correlations between 
selected variables and scores on the ex- 
perimental tests and Scholastic Aptitude 
Test (verbal and mathematical). Suffice 
it to say here that correlations of the ex- 
perimental test scores with school grades, 


JOHN W. FRENCH 


measure skills and knowledge picked up 
from appropriate experiences, we can con- 
sider one kind of validation for the tests 
to be the multiple correlation of the rele- 
vant experience variables with scores on 
each test. Here we have the reverse of the 
situation existing in many prediction stud- 
ies. Instead of a single criterion and mul- 
tiple predictors, we have a single predictor 
and multiple criteria. The criteria may be 
weighted either by judgment according to 
what kinds of experiences each test should 
reflect, or they may be weighted so as 
to be mazimally predictable. Both of these 
methods of weighting were tried. 


teachers’ ratings on ability, and self ratings, ,^ Judgment Criterion. For want of any 


were all suitably high for corresponding 
tests and subject-matter fields. 

Analysis. Since the tests can be evaluated 
in one way by the extent to which they 


TABLE 3 
MULTIPLE CORRELATIONS or EXPERIMEN- 
TAL Test Scores WirrH CRITERIA Con- 
SISTING OF EXPERIENCE VARIABLES 
WEIGHTED ACCORDING TO JUDGMENT 


Multiple Correlations^ 
Tests Boys Girls 
Sample Sample Sample Sample 
Science Glos- | .556 | .504 | .477 | .430 
sary 
Science Abili- | .401 | .438 | .360 | .274 
ties 
Social Studies | .197 | .248 | .399 | .144 
Abilities 
Social Studies | .106 | .194 | .130 | .008 
Essay 
Humanities -516 | .483 | .489 | .375 
References 
Humanities .391 | .395 | .510 | .315 
Abilities 


^ Formula after 


Kelley taken from Dunlap and 
Kurtz (1932), 


Te 
Vie 
x Pn 


where re is the mean validity, r; the mean intercorrela- 
tion among predictors, and A the number of predictors. 


Tze G WyXy) = 


more insightful judgment, this criterion 
will consist of an equal weighting of all ex- 
perience variables in the restricted field of 
the test. These variables are listed as fol- 
lows: 

Science 
Science Hobby 
Science Read 
Science Courses 
Science Liked 


Social Studies 
History Read 
Government Read 
Social Studies Courses 
Social Studies Liked 


Humanities 

Writing Hobby 

Fiction Read 

Music or Art Read 

English Liked 
Table 3 presents the multiple correlations 
based on equal weightings of the appropri- 
ate four variables. The higher multiple cor- 
relations may be interpreted as instances 
where the test measures what it was judge 
that it should measure. By this criterio? 
the Science Glossary test looks best; thé 
social studies tests look poorest. 

Maximally Predictable Criterion. Sepa" 

rately for each of the three fields, e2¢” 
sex, and each sample, variables were 8° 
lected to constitute a most predictable 
criterion. Table 4 lists the tests selected 10* 
Sample 1, gives the beta weights, and gives 
both the multiple correlations for Sample 
1 and the multiple cross-validation, that 1® 
the figure obtained when the Sample 
weights were applied to Sample 2. Table 5 


TABLE 4 
MULTIPLE CORRELATIONS or EXPERIMENTAL Test Scores WrrH EXPERIENCE 
CRITERIA WEIGHTED FOR Maximum PREDICTABILITY, WEIGHTING 
Basep ON SAMPLE 1 


Los 
kS \ Beta-Weights and Multiple Correlations 
Boys Girls 
Test Experience Variables 
- Cross- Cross- 
B Validation | Val. B8 Validation | Val. 
(Sample 1) | (Sam- (Sample 1) | (Sam- 
ple 2) ple 2) 
Science Glos- | Science Hobby .039 — .060 
sary Social Hobby 177 — 
Science Read .365 .320 
Mechanies Read E .125 
Math. Courses 275 .654 .578 — .656 .465 
Math. Liked — .043 .180 
English Liked — .233 
Science Courses = .389 
Science Liked .196 — 
Science Abili- | Science Hobby .105 .225 
ties Social Hobby .141 — 
Science Read .263 .024 
Mechanies Read = .057 
Math. Courses .184 .530 .539 = -618 478 
Math. Liked .091 .335 
English Liked — -272 
Science Courses — 311 
Science Liked .047 = 
Social Studies | Writing Hobby .283 .233 
Abilities Fiction Read .109 074 
History Read .058 = 
Government Read — .132 j 
Music or Art Read — anal bee .153 ioni a 
English Liked —.073 .112 
Language Courses .318 — 
Soc. Stud. Liked .145 .159 
Social Studies | Writing Hobby .164 140 
Essay Fiction Read -158 .130 
History Read — .029 p^ 
Government Read = 4 
Music or Art Read ag .408 -301 | — “002 .307 .266 
English Liked .298 .007 
Language Courses | —.029 — 
Soc. Stud. Liked | —.080 .160 
Humanities Writing Hobby .313 .077 
References Fiction Read .091 .043 
History Read — = s 
Government Read .064 E 
Music or Art Read | — im in .233 Bm | e 
English Liked .054 .253 
Language Courses .384 — 
Language Liked .050 = 


80 


JOHN W. FRENCH 


TABLE 4 (Continued) 


Beta-Weights and Multiple Correlations 
Boys Girls 
Test Experience Variables 
Cross- Cross- 
Validation | Val. Validation | Val. 
B (Sample 1) | (Sam- B (Sample 1) | (Sam- 
ple 2) ple 2) 
Humanities Writing Hobby .361 |) .054 
Abilities Fiction Read .083 17 
History Read — —.094 
Government Read | — .120 .128 ALL 
Music or Art Read — :685 1587 .268 suge 
English Liked .094 .310 
Language Courses 524 — 
Language Liked —.176 — 


does the same thing with the samples re- 
versed. 


of experi- 
that these 


i : reasons for 
this: (a) the judgment of what the tests 


Should measure was imperfect, or (b) the 
tests measure Something different from 


ment read, and number of social Studies 
courses taken do not have the expected 


importance in predicting the scores on the 
Social studies tests, Liking social studies 
courses is an important contributor as eX- 
pected, but the other good ones include 
having a. writing hobby, number of lan- 
guage courses taken, and number of books 
of fiction read. This seems to mean that the 
social studies tests are too highly related to 
the purely verbal or communication abili- 
ties rather than to the more specialized 
abilities or understandings needed in th? 
Social studies field, However, the correla- 
tions in Tables 1 and 9 suggest that, if ver- 
bal variance as defined by the verbal sec 
tion of the Scholastic Aptitude Test e 
Partialled out of the social studies tes 
Scores, beta weights for the number i: 
books read about history and governme? 
would become more prominent. 


Discussion and Conclusions 


Although we are studying experienc? 
variables with the intention of judging th? 
effect of experience on test scores, we CaP- 
not assume that high test scores for a p 
tain group are necessarily caused by the 
special experiences of that group. For e 
ample, Table 1 shows a positive correla: 
tion between number of language course 
taken and all of the experimental testo 
What evidence is there that the languag? 
experience directly affected the test score? 


TABLE 5 


MULTIPLE CORRELATIONS OF EXPERIMENTAL Test Scores WirH EXPERIENCE 
CRITERIA WEIGHTED FOR MAXIMUM PREDICTABILITY, WEIGHTING 
BASED ON SAMPLE 2 


Beta-Weights and Multiple Correlations 


Boys Girls 
Test Experience Variables 
" Validation qui A Validation | Wal, 
ample 2) | (Sam- (Sample 2) | (Sam- 
ple 1) ple 1) 
Science Glos- Science Hobby .076 215 
sary * Writing Hobby -274 .219 
Science Read .249 = 
Math. Courses - 202 — 
Math. Liked son |f 4907 (999 | reg [y 388: a puso 
Language Courses — 338 
Science Courses = .454 
Science Liked .233 .134 
Science Abili- | Science Hobby .162 170 
ties Writing Hobby 1288 -283 
Science Read E — 
Math. Courses .20! 7 L— 
Math. Liked “300 || 9 | F841 aoai 595m 
Language Courses — .242 
Science Courses — -401 
Science Liked 175 —.110 
Social Studies | Writing Hobby .295 .065 
bilities Social Hobby M .106 
Fiction Read .099 155 
History Read .187 .584 .536 | —.022 .427 .439 
Government Read | —.043 E 
Language Courses .166 .147 
Soc. Stud. Liked .198 -237 
Social Studies | Writing Hobby .222 .098 
ABB Social Hobby — .081 
Fiction Read —.028 040 
History Read .127 .385 .331 | —.010 .309 .912 
Government Read | — -060 — 
Language Courses .145 .089 
Soc. Stud. Liked .225 .212 
Humanities — | Writing Hobby .245 .231 
eferences | Fiction Read .106 .056 
History Read = .122 
Music or Art Read .188 .555 .593 zl .457 .452 
English Liked .053 013 
Language Courses | -264 .210 
Language Liked .012 ue 
Humanities Writing Hobby .228 —.010 
llities Fiction Read .189 —.041 
History Read — .268 
Music or Art Read | —.011 .534 .562 .022 .450 .425 
English Liked — .015 300 
Language Courses .299 .218 
Language Liked .084 — 


82 JOHN W. FRENCH 


Let us assume that the special experi- 
ences being considered in this study have 
relatively little effect on the Scholastic 
Aptitude Test. This is a reasonable as- 
sumption, because these tests are known to 
be relatively noncoachable. (Dear, 1958; 
Dyer, 1953; French, 1955) This assump- 
tion makes it possible to get some idea of 
the kind of student who undergoes each 
experience. Reference again to Table 1 
shows that the number of language courses 
taken has a very substantial correlation 
with Scholastic Aptitude Test scores. This 
suggests that the language experience may 
have had no effect on the experimental 
test scores, since the high correlation be- 
tween these tests and the number of lan- 
guage courses taken can be accounted for 
by the fact that the students electing five 
years of languages are well above average 
in verbal and mathematical aptitude. 

On the other hand, the number of science 
courses taken correlates well with the sci- 
ence experimental scores, but does not cor- 
relate appreciably with either section of 
the Scholastic Aptitude Test. In this case, 
since the experimental scores are not ac- 
counted for by the general scholastic ap- 
titude of students electing science, it seems 
probable that the science courses or some 
other experiences associated with students 
who take four years of science are respon- 
sible for the high test scores. Partial cor- 
relations were used to demonstrate this 
effect in more detail, but the figures ob- 
tained are not essential to this article. 

The results of this study indicate that 
science experience is well represented in 
the tests. In connection with the variable, 
number of science courses taken, the pre- 
liminary screening study suggests that the 
effect on the test scores is present for 
physics and chemistry but not for general 
science and biology. Since general science 
is almost universally taken by high school 
students, little discrimination by this vari- 
able could be expected. The low relation- 
ship with taking biology is less easy to 


understand, since biology was well repre- 
sented among the test items. 

The humanities tests, as might be ex- 
pected, have high relationships with writ- 
ing and reading activities. However, their 
lack of relationship with music and art 
hobbies confirms what the nature of the 
tests seems to indicate: namely, that schol- 
arly appreciation and knowledge of music 
and art were being measured rather than 
talent or skill. 

The social studies tests are so highly 
verbal that they also measure writing and 
general reading activities rather than ex- 
perience specific to the social studies. A 
real challenge in test construction will be 
to produce a test of understanding in the 
field of social studies which is not too highly 
verbal. The low relationship with courses 
other than American history is a finding 
that is consistent with the emphasis to be 
found in the test items. 

The results of this study can be taken 
as a demonstration that, when a test is to 
be evaluated in terms of its relationship 
to multiple experiences (or criteria), use 
ful information on what the test actually 
measures can be had by computing the 
multiple correlation with experience vat! 
ables and cross-validating if possible. How- 
ever, the only way to evaluate how well 
the test measures what it should measure 
requires use of weights that have been de- 
cided upon in advance. 


REFERENCES 


Dear, R. E. The effects of a program of inten" 
sive coaching on SAT scores. PrincetoD» 
N. J.: Educational Testing Service, 199°" 
(Unpublished report.) f 

Dustap, J. W., & Kurtz, A. K. Handbook m 
statistical nomographs, tables, and fo 
mulas. New York: World Book, 1932- 

Dyer, H. S. Does coaching help? Coll- 
Rev., 1953, No. 19, 331. R 

Frencu, J. W. An answer to test conchiDÉ* 
Coll. Bd. Rev., 1955, No. 27, 5-1. 


Received September 19, 1958. 


EL m 


JOURNAL OF EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 2, 1959 


THE EFFECT OF AN INTRODUCTORY PSYCHOLOGY 
COURSE ON SELF-INSIGHT! 


FRANK COSTIN 
University of Illinois 


Cana beginning course in psychology in- 
crease self-understanding? While instruc- 
tors frequently assert that it can, or at 
least Should, little research has been pub- 
lished showing that it actually does (Birney 
& MeKeachie, 1955). The purpose of this 
Paper is to report the results of a study de- 
Signed to help answer the question, 


METHOD 


Four introductory psychology classes 
Were investigated, All of these pursued the 
Same one-semester course, which was part 
of 4 two-year general education program 
in the Division of General Studies at the 
University of Illinois. (While the Division 
1S an integral part of the College of Liberal 
Arts and Sciences, its courses are open to 
and attract students from a wide variety of 
Colleges within the University.) A major 
lective of the course was to help students 
understand their own behavior. 

ollowing a one-week orientation to the 
Nature of psychology, these topics were 
Studied: perception (two weeks), motiva- 
tion (three weeks), learning (three weeks), 
intelligence (three weeks), and personality 

Ve weeks), 

UN attended lectures two hours a 
a. With each half of the class also meet- 

5 aS a separate discussion section for an 
sdditional two hours a week. A basic text, 
ns rts references, and films ampli- 
Ms m Scope of the lectures and discus- 
Fg he same instructor taught all four 

“eS Over a period of four consecutive 
Semesters, 

Ls 37-item "self-insight" scale, developed 

Y Gross (19482), was administered to each 

*A briefer version of this paper was pre- 


uel 9n September 1, 1958, at the conven- 
of the APA in Washington, D. C. 


83 


class at the beginning and end of the 
course. This particular instrument was se- 
lected because some studies concerned with 
the effects of teaching methods had indi- 
cated it had good possibilities for measur- 
ing a rather specific aspect of self-under- 
standing (Gross, 1948b; Ruja, 1954). The 
seale was also given at the beginning and 
end of the semester to a class in verbal 
communication, another course in the Di- 
vision of General Studies. These 97 stu- 
dents were similar to the psychology stu- 
dents in age and college representation. 
They had never taken a college course in 
psychology and were not currently enrolled 
in one. The reason for testing them was to 
see what effect general college experience, 
independent of a psychology course, might 
have on the kind of self-insight measured 
in this study. 

The following excerpts from Gross’ de- 
tailed description of his instrument explain 
in part its rationale: “Self-insight is the ac- 
ceptance and admission of both the pres- 
ence and absence of personality traits 
within oneself when this acceptance... 
clashes with one’s feelings of self-esteem” 
(Gross, 1948a, p. 223). “The most severe 
test of self-insight will be found in the abil- 
ity of the individual to accept as true those 
truths which are implicitly or explicitly de- 
nied by social usage and to accept as false 
those falsehoods which are implicitly or ex- 
plicitly affirmed by social usage" (Gross, 
19483, p. 222). 

The following sample statements illus- 
trate more specifieally the nature of the 
scale: "I have criticized other people for 
saying things which I might very well 
have said myself.” “Occasionally I have 
sexual thoughts which I would not like to 
reveal to other people." "I can as easily 


84 FRANK COSTIN 


laugh at myself as at other people." "I 
have no feeling of hostility toward anyone." 
According to Gross, an individual will re- 
veal his degree of self-insight by the manner 
in which he responds to statements like 
these. For example, strong agreement with 
the item, “I have criticized other people for 
saying things which I might very well have 
said myself,” shows a high degree of self- 
insight. Strong disagreement expresses low 
self-insight. On the other hand, strong 
agreement with the item, “I have no feeling 
of hostility toward anyone,” reveals low 
self-insight. Strong disagreement indicates 
high self-insight. 

Students were asked to respond to each 
of the 37 statements in the scale by choosing 
one of five options: strongly agree, agree, 
uncertain, mildly disagree, and strongly 
disagree. In the scoring procedure origi- 
nally developed by Gross, a 2 or 1 was as- 
signed to responses revealing relatively 
high self-insight, and a —2 or —1 to re- 
sponses showing relatively low self-insight. 
“Uncertain” responses were scored 0. Since 
the present study was planned so as to use 
mean scores in estimating changes in self- 


TABLE 1 
Seur-Instcur or STUDENTS BEFORE AND 
AFTER A COURSE IN INTRODUCTORY 
PSYCHOLOGY As COMPARED TO A CoN- 
TROL GROUP TAKING A COURSE IN 
VERBAL COMMUNICATION 


Students in | Students in 
Introductory Communica- 
Psychology | tion Course 
Before Course: 
Mean 117.87 117.30 
SD 16.32 13.05 
After Course: 
Mean 121.63 117.11 
SD 16.32 14.71 
Difference in Means 3.76 —.19 
t 4.37* AT 
N 179 97 


Note.—High scores indicate greater self-insight, 
* P «0t. 


insight, a modifieation of Gross' scoring 
system was made in order to simplify com- 
putations. This change consisted of elimi- 
nating the negative sign from —1 and —2, 
and changing 0, 1, and 2 to 3, 4, and 5, 
respectively. Thus, for example, strong 
agreement with the statement, “I have no 
feeling of hostility toward anyone,” re- 
ceived 1; mild agreement, 2; uncertainty, 
3; mild disagreement, 4; and strong dis- 
agreement, 5. Accordingly, the higher à 
student’s total score on the scale, the 
greater was his self-insight. 


RESULTS 


As Table 1 shows, psychology students 
made a significant increase in self-insight 
score, the mean change being 3.76 scale 
points. The verbal communication class, 
however, revealed no significant change in 
its mean score. A comparison of the data 
for the two groups shows that the initial 
mean self-insight scores of the psychology 
and the verbal communication students 
were approximately the same at the be- 
ginning of their Tespective courses. Only 
the psychology students, however, changed 
significantly. It would seem, then, that the 
psychology classes changed as the result of 
taking a course in that subject, and not 
simply because of general college experi 
ences. , 

To discover what relationship might exis 
between knowledge acquired in the psy- 
chology course and increase in self-insight, 
students achieving in the upper half of 
their class were compared with those in the 
lower half. Achievement was measured | 
four objective examinations given during 
the semester, and a comprehensive final eX- 
amination at the end of the course. Results 
of this analysis are described in Table 2- 
The self-insight scores of the two groups 
were approximately the same at the peg? 
ning of the course, the actual difference 9 i 
1.59 having a t value of .65. Upper hé 
Students increased their self-insight 587 


INTRODUCTORY PSYCHOLOGY AND SELF-INSIGHT 85 


TABLE 2 


SELr-INsiGHT or STUDENTS ACHIEVING IN UPPER AND Lower HALVES 
or Inrropuctory PSYCHOLOGY 


Before Course After Course Difference 
Mean Mean in 
Score SD Score SD Means 
Upper half (N = 90) 118.66 | 16.69 | 124.01 | 17.08 | 5.35 | 4.42* 
ower half (V = 89) 117.07 15.89 119.21 15.14 2.14 1.75 


ee higher the score, the greater the self-insight. 
D «.01. 


TABLE 3 


Setr-Insigut or MEN AND WOMEN ACHIEVING IN UPPER AND LOWER 
HALVES or INTRODUCTORY PSYCHOLOGY 


Before Course After Course Difference 
Mean Mean In 
Score 5D Score SD Means $ 
Upper half 
um (N — 87) 119.37 | 16.34 | 124.70 | 14.80 | 5.33 | 3.63. 
caer = 33) 117.42 17.22 122.82 20.37 5.40 2.495 
er half 
m (N — 60) 117.88 15.53 121.25 15.14 3.37 2.25* 
omen (N — 29) 115.38 16.52 115.00 14.24 .38 .20 


Note.—The higher the score, the greater the self-insight. 


*P <.01. 
02> p> 01 
°.05 > p > 02, 


nificantly by 5.35 scale points. The lower 
Al's increase of 2.14 was not a significant 

Sain, 
de analysis was also made of sex differ- 
iow ìn self-insight change. Men and 
of Tie Students achieving in the upper half 
aud ni class were compared, as were men 
gis omen in the lower half. Table 3 shows 
op alis; Upper-half men and women 
extent pies their scores, and to the same 
bend ower-half women did not change. 
SCofes alf men, however, increased their 
change Significantly, and the amount of 
from th (3.37) did not differ significantly 
(This ary town by upper-half men (5.33). 
erence of 1.96 had a ¢ value of 


48.) It should also be noted that the initial 
scores of both male groups revealed no sig- 
nificant difference, the ¢ value being .51. 


DISCUSSION 


This study has shown that a significant 
increase in self-insight was made by stu- 
dents who completed a one-semester intro- 
ductory psychology course. Since verbal 
communication students did not change, it 
is reasonable to conclude that the psychol- 
ogy course was the effective agent in pro- 
ducing change, and not simply general uni- 
versity experiences. Gain in self-insight 
apparently had a positive relationship to 
the acquisition of course content, as meas- 


86 FRANK 


ured by classroom examinations, because 
the total group of students achieving in the 
upper half of their class, on the basis of ex- 
amination scores, gained significantly in 
self-insight, while the total group of stu- 
dents achieving in the lower half did not 
change. 

In addition to the information which was 
actually measured by course examinations, 
however, other kinds of learnings probably 
contributed to increased self-insight, at 
least in the ease of one group of students— 
men whose scholastic achievement in the 
course was in the lower half of their class. 
This conclusion is inferred from the fact 
that while both men and women in the 
upper half of their class made significant 
gains in self-insight, men in the lower half 
also gained. This finding does not neces- 
sarily mean, of course, that the increased 
self-insight revealed by the lower-half 
males was independent of any information 
which they acquired from the course. More 
likely, it may mean that some of the knowl- 
edge which influenced them to change, al- 
though learned through taking psychology, 
was simply not measured by the examina- 
tions to any appreciable extent. Or, to state 
this in a slightly different manner, the kind 
of information which the psychology in- 
structor’s tests measured may not have 
been exclusively essential for gaining the 
kind of self-insight measured by Gross’ 
scale. It is also quite possible that retention 
of only a relatively small amount of the 
course content, or acquisition of a rather 
restricted range of information, was suffi- 
cient to result in the attainment of self- 
insight. 

The fact, then, that men whose scholastic 
achievement was in the lower half of their 
class did increase their self-insight makes 
it clear that at least for them a kind of 
course achievement took place that was not 
apparent simply by examining their sub- 
ject matter test scores. Other factors, not 
measured in this study, must also have 


COSTIN 


been influential in increasing their self- 
insight. This interpretation, however, must 
be considered again in the light of still an- 
other finding: women achieving in the 
lower half of their class did not change sig- 
nificantly. No data were collected to ex- 
plain why this occurred. Some plausible 
speculations can be invoked, such as differ- 
ences in ability and motivation between 
lower-half men and women. (Motivation 
is a more likely clue.) Further investiga- 
tions should be carried out to see whether 
this particular sex difference was unique 
for this study, or whether similar sex dif- 
ferences would continue to be obtained. For 
the purpose of the present investigation, 
the chief value of the finding that lower- 
half women did not gain in self-insight is 
that it gives more weight to the previously 
stated inference: information acquired in 
the course, as measured by classroom ex- 
aminations, played an important and spe- 
cific part in influencing self-insight in- 
crease. 

As stated at the beginning of this paper, 
a major objective of the psychology course 
was to help students understand themselves 
better. The results of the investigation re- 
ported here indicate that this objective was 
achieved. The actual amount of self-insight 
increase, however, was rather small. Or, to 
state this more accurately, there was much 
less change than the investigator had an- 
ticipated. Perhaps it was unrealistic to have 
expected a greater change from so brief an 
encounter with formal psychology. Then, 
too, half of the course dealt with topics 
which were not as likely to inerease the 
kind of self-understanding measured aS 
were those of motivation and personality- 
Had this study been carried out in à be- 
ginning course containing more material 
on personality and motivation, greater de- 
grees of change might well have occurred. 

How appropriate was the use of the term 
“self-insight” to describe the changes meas- 
ured in this investigation? This particula" 


INTRODUCTORY PSYCHOLOGY AND SELF-INSIGHT 87 


label was used because it was the name 
Gross had given to his scale. One might 
reasonably question whether the kinds of 
Tesponses evaluated with this instrument 
really represented “self-insight.” The con- 
cept certainly means quite different things 
to different psychologists; even those who 
would agree on its meaning might argue 
that its use in this study was inappropriate. 
(The reader will probably already have 
noticed the similarity between Gross’ scale 
and the Lie Seale of the Minnesota Multi- 
mes Personality Inventory.) The present 
“aren is inclined to think that it 
dia e more accurate to employ a spe- 
Eo erm like ‘self-aceeptance” or “de- 
ki e in defensiveness” to describe the 
nds of changes which were measured. 

M eiii the soundness of the foregoing 
fact n lons to the rubric “self-insight,” the 
ap emains that an introductory psychol- 
k sha e sara: demonstrated to be some- 
bulis ae in increasing a readily meas- 
self-und ind of self-understanding. Since 
in ed ain, of one kind or another, 
fiesil a nh many instructors, and prae- 
ù í a l students of psychology subscribe 
PUE da ould be highly desirable if further 
Starch, of this kind were made. Such re- 
Bile ys could profitably be coneerned not 
that en lan introductory course similar to 
"wh €seribed in the present report, but 
ie organized quite differently. 
bs ao of self-understanding, calling 
ments i kinds of measuring instru- 
Gites S pris also be investigated. (In the 
Cited ss vanced courses, a more sophisti- 
E of device than that used in the 
sary.) Phi would probably be neces- 
“ecumul Aap gia information of this sort is 
he in ated, teachers of psychology would 

à better Position to evaluate the ex- 


tent to which their course goals of self- 
understanding are being achieved. 


SUMMARY 


Gross’ self-insight scale was administered 
to 179 undergraduates before and after an 
introductory course in psychology, which 
was part of a two-year general education 
program. The group as a whole showed a 
small but significant increase in its self- 
insight score. Both men and women whose 
scholastie achievement in the course was in 
the upper half of their class increased their 
self-insight scores significantly, and to the 
same extent. In the lower half, men also 
increased  self-insight significantly, but 
women did not change. It was concluded 
that changes in self-insight were positively 
related to information acquired from the 
course as measured by objective examina- 
tions, although course learnings not evalu- 
ated by the examinations also probably 
played a part in effecting change. As a con- 
trol measure, Gross' seale was given to 97 
students before and after a verbal commu- 
nication course, also a part of the general 
education program. No significant change 


in self-insight score occurred. 
REFERENCES 

Biever, R., & McKzacnig, W. The teaching 
of psychology: A survey of research since 
1942. Psychol. Bull., 1955, 52, 51-68. 

Gnoss, L. The construction and partial stand- 
ardization of a scale for measuring self- 
insight. J. soc. Psychol., 1948, 28, 219- 
236. (a) 

Gross, L. An experimental study of the va- 
lidity of the non-directive method of 
teaching. J. Psychol., 1948, 26, 243-248. 
(b) 

Rusa, H. Nondirective teaching and self- 
insight: A statistical addendum. J. gen. 
Psychol., 1954, 51, 331-332. 


Received October 8, 1958. 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 2, 1959 


RETENTION IN ARITHMETIC AMONG CHILDREN OF LOW, 
AVERAGE, AND HIGH INTELLIGENCE! 
AT 117 MONTHS OF AGE! 


HERBERT J. KLAUSMEIER AND JOHN F. FELDHUSEN* 


University of Wisconsin 


Considerable research in learning eff- 
ciency and retention has been completed 
during the last 50 years. McGeoch and 
Irion (1952, p. 376) cautiously summarized 
their review of research thus: "It seems, 
then, that the slow learner gains no ad- 
vantage in retention from his slowness, 
and that the fast learner is at no disad- 
vantage because of his superior speed." 
The studies reviewed by McGeoch and 
subsequent research generally include these 
features: the learning task content is non- 
sense syllables, unrelated words or nu- 
merals, pictures or color designs; the slow 
and fast learners are identified according 
to scores made on an initial learning task 
that is common to all the subjects; a va- 
riety of procedures are followed in the 
presentation of the acquisition task, in set- 
ting the interval between aequisition and 
retention, and in securing and treating the 
retention scores in relation to the acquisi- 
tion scores. 

The researchers in the present study, 
different from most previous researchers, 
were concerned with fast and slow learn- 
ers as usually defined in school situations; 
the learning tasks were limited to those 
intimately related to arithmetic learning 
and instruction in the elementary school; 
the learning task was graded to the achieve- 
ment level of each child; and two condi- 
tions of learning and two intervals of 
retention were employed. The primary pur- 


1The research reported herein was per- 
formed pursuant to a contract with the 
United States Office of Education, Depart- 
ment of Health, Education, and Welfare. 

?'The project was under the direction of 
Klausmeier with Feldhusen and John Check 
serving as research assistants. 


88 


pose of this investigation was to test the 
hypothesis: retention of arithmetic learn- 
ing is the same among children of low, 
average, and high intelligence at a mean 
age of 117 months when the original task is 
graded to the learner's achievement level. 


METHOD 


Subjects. The sample consisted of 20 
boys and 20 girls of low intelligence (WISC 
IQs 56-81), 20 boys and 20 girls of aver- 
age intelligence (WISC IQs 90-110), and 
20 boys and 20 girls of high intelligence 
(WISC IQs 120-146). The low IQ ohil- 
dren were drawn from two special class- 
rooms in Madison and ten special class- 
rooms in Milwaukee. Excluded from the 
study were low IQ children exhibiting & 
second handicap to a marked degree oT 
definite organic symptoms of retardation 
such as mongolism. The average and high 
IQ children were drawn from 11 regular 
classrooms in Madison. All ehildren had 
birthdates between September 15, 1947 and 
December 15, 1948. The mean age of the 
children at the midpoint of the study, 
February 15, 1958, was 117 months. : 

Procedure. A counting and an additio? 
task were developed and administered tO 
each child individually in a room of the 
school building. Retention measures weI* 
secured exactly five minutes and approx” 
mately six weeks after the first acquisitio® 
measure for each child. 4 

The procedure for the counting task 15 
now presented. Selecting a task at eae 
child's achievement level was accomplishe 
by means of a survey test which involve 
a procedure similar to the vocabulary test 
of the Stanford Binet. Each child was aske 


RETENTION AND LEARNING ABILITY 89 


» count without assistance by 1’s, 2’s, 3’s, 
Ta ua s in ten steps, i.e., 3's to 30, 13's 
ae : until a level was found where he 
We Wo consecutive errors, or after en- 
Den dem quit or stated he could not go 
ace en numbers or items at this level, 
i e could not count correctly without 
| eee constituted his new task to be 
[x He was then taught the ten num- 
lipid ait period of 19 minutes. If he 
90 in le e ten numbers, eg., 9, 18, 27... 
Pind ss than 19 minutes, he was given 
aini des. activities throughout the re- 
Was Be n At the end of 19 minutes, he 
is o" iR count aloud the ten new num- 
ven. Th ad learned; no assistance was 
Nitro. z e total correct was his acquisition 
6 "s e every child, regardless of level 
ollowin le score could vary from 0-10. 
Téidin g the acquisition test, a 5-minute 
the "ehe ensued for the child. Then, 
Secured by 5-minute retention score was 
aw y having the child again count 
mia numbers aloud without as- 
identical ¢ B 6-week retention test was 
me d the acquisition and the 5-min- 
-Week 3 ion test. The 5-minute and the 
also var Rue Scores in counting could 
he SE 0-10 for each child. 
child : ition task was started for each 
Biggs pee 12 weeks after the 
Quisition ask. The survey, teaching, ac- 
ion, and oe period, 5-minute reten- 
addition y "Week retention procedures for 
except fo vere the same as for counting 
child Bend one variable. As soon as the 
ing was st n the ten addition items, teach- 
and the 5 Pped, the acquisition test given, 
Thus, u ee reading period started. 
to learn (ig counting task, both the time 
items to h minutes) and the number of 
Stant for nemed (ten) were held con- 
Mitted E. children; this, in turn, per- 
learning oe amounts of time for over- 
of items nF addition task, the number 
" in taught to and performed 
ce by the child was also held 


Si 


Correct] 


constant, but time to learn the ten items 
was allowed to vary up to a maximum of 
17 minutes in order to prevent overlearn- 
ing. This purposeful variation of the learn- 
ing arrangement was done to allow test- 
ing of the hypothesis under conditions of 
no oyerlearning but unequal amount of 
time to learn, and of overlearning but equal 
time to learn. 

Four additional features of the overall 
procedures are now clarified. 

The final placement of items according 
to difficulty levels in the survey tests of 
counting and addition was done after ex- 
tensive review of general research on grade 
placement of arithmetic learnings, of cur- 
riculum guides in arithmetic generally, of 
the specific guides and textbooks in the 
cooperating schools, and tryout with a 
small number of children. The counting 
task levels actually found through the sur- 
vey test ranged from counting by 1’s to 
3s for the low IQ children, with pennies in 
some instances; for the average IQ, from 
3’s to 16’s; and for the high IQ, from 3's 
to 23’s. The median levels were counting 
by 2’s, 7’s, and 12’s for the three groups re- 
spectively. The addition task levels for the 
low IQ children ranged from adding dig- 
its to adding 2-place numbers, 2 addends, 
with carrying; 39 of the 40 average IQ 
children added without assistance items in- 
volving 3 addends, 3-place numbers and 
38 had as their new task level adding yards, 
feet, and inches; 33 of the high IQ group 
also had the same linear measure addition 
with only 3 moving into the level involving 
common fractions and mixed numbers. 

While the child was learning to count, 
any error he made was called to his at- 
tention and corrected. To each of his initial 
correct responses, the researcher usually 
followed immediately with “Right,” “Cor- 
rect,” or “Good,” and with the same words 
to aseries of correct responses once learned. 
In the addition task every incorrect re- 
sponse was called to attention and cor- 


90 HERBERT J. KLAUSMEIER AND JOHN F. FELDHUSEN 


rected; every correct response also was 
called to the child's attention with words 
as above and as soon as a correct response 
was made, the next of the 10 items was 
started (for further details, see Feldhusen, 
1958). Thus, all ten of the possible ten 
correct responses in each task were rein- 
forced. 

Since the interval between the first, pres- 
entation of the counting task and the addi- 
tion task was approximately 12 weeks, it 
was felt that little intertask effect would 
occur, ie. the counting task would not 
affect the addition task. Had such an effect 
occurred, it was assumed that the results 
of the analysis of the addition task would 
not be affected in such a way as to alter 
the pattern of means and differences since 
all of the children were taught both tasks 
with about the same time interval between 
tasks. 

Six members of a larger research group, 
including the two authors, administered 
the survey-teaching-acquisition-retention 
sequence. To assure uniformity, a small 
number of children were brought into the 
learning laboratory as instructional sub- 
jects for the research team. Weekly meet- 
ings were also held to clarify procedures 


during the six months the data were gath- 
ered. 


RESULTS 


The means and standard deviations of 
the acquisition, 5-minute retention, and 6- 
week retention scores in the counting and 
the addition tasks are presented in Table 
1. The mean acquisition scores of 7.55, 9.52, 
and 9.65 in counting and of 6.28, 5.70, and 
7.38 in addition suggest that, though the 
task level may have been quite accurately 
identified for each child both in counting 
and in addition, counting a series of ten 
items and adding a series of ten items at 
the identified task level was not equally dif- 
ficult, according to IQ levels. The high IQ 
group shows the highest mean acquisition 
in both tasks; however, the low IQ group 
is lowest in counting (7.55), while the 
average IQ group is lowest in addition 
(5.70). It was previously noted that 38 
of the average and 33 of the high IQ group 
had the same task level in addition—the 
adding of yards, feet, and inches. These 
71 individuals thus had ten identical addi- 
tion items to learn, but the high group 
learned more items than did the average. 

Partly because the mean acquisition 
scores were not the same for the three IQ 
groups, analysis of covariance (Edwards, 
1950) was used in testing the hypothesis 
Also, analysis of covariance was used be- 
cause of the high correlations between 87 
quisition and 5-minute retention scores, 35 


TABLE 1 


MEANS AND STANDARD DEVIATIONS or ACQUISITION, 
6-WEEk RETENTION SCORES IN COUNTIN 


5-MiNvuTE RETENTION, AND 
G AND ADDITION 


Counting 


Addition 
IQ Group Acq. 5-Min. Ret.|6-Wk. Ret. Acq. 5-Min. Ret.|6-Wk. Ret- 
M |c | M c | M c | M o | M c M d 
quet 
40 Low IQ 7.55| 2.53| 7.35| 2.90| 5.28| 3.11| 6.28| 2.78 6.18} 2.92] 4.80 2.83 
40 Av. IQ 9.52| 1.34) 9.50| 1.14| 7.10| 2.90| 5.70| 2.36| 6.00| 2.57| 3.15 1.59 
40 High IQ 9.65} .94| 9.65 


.73| 7.98| 2.77| 7.38| 1.90, 7.88| 1.81| 5.23) 3-2° 


R a 
eee ———msá—————— — "RN 


RETENTION AND LEARNING ABILITY 91 


TABLE 2 
CORRELATIONS BETWEEN ACQUISITION AND RETENTION SCORES 
IN COUNTING AND ADDITION 


Counting Adding 
IQL 
S Level N Acquisition | Acquisition | Acquisition Acquisition 
and 5-Min. | and 6-Wk. and 5-Min. and 6-Wk. 
Ret. Ret. Ret. Ret. 
ur 40 .76 .39 .89 .63 
Bich” 40 .66 .23 .87 .27 
Total 40 63 29 89 67 
a 120 .79 .42 .89 .56 
TABLE 3 


ANALYSIS or COVARIANCE FOR ACQUISITION, 5-MINUTE RETENTION, AND 6-WEEK 
RETENTION MEASURES IN COUNTING AND ADDITION WITH ACQUISITION 
THE COVARIATE 


Analysis Area Source of Variation df ons Fe bya 
Acquisition &5-Min.| Adjusted means 2 4.46 2.07 
et.: Counting Within groups 116 1.07 «768 
TS Total 118 
Acquisition &6-Wk.| Adjusted means 2 21.22 2.64 
ct.: Counting — | Within groups 116 8.05 .516 
Kee uve Total 118 
mud tion &5-Min.| Adjusted means 2 4.06 3.24 
+? Addition Within groups 116 1.4 .916 
Aequisi« Total 118 
Ee tion &6-Wk.| Adjusted means 2 18.77 3.60 
+? Addition Within groups 116 5.22 598 
Total 118 


a 
For gigni 
of signee “nificance at the .05 level, F must equal or exceed 3.08; for significance at the .01 level, 4.80. The .01 level 


igni 3 h 
cance is required for neceptance or rejection of the hypothesis. 


Shown in TABLE 4 


Table 2. These relatively high cor- 


Telati 

ons eda SE i 

ced S suggest indirectly that the pro- Apsustep MEANS ror RETENTION IN 
ures used i ; ; 1 COUNTING AND ADDITION WITH 

Measurin T BUTVNIDB, teaching, anc First ACQUISITION SCORES AS 

: 8 Acquisition and 5-minute reten- THE COVARIATE 


lon $ 
Were quite accurate. 


li Fac : EE 
Witton pan of the four analyses of co- Counting Addition 
ention for the acquisition and two re- IQ Group 5 6- 5. ra 
dition are nutes in counting and in ad- Minute | Weck |Minute| Week 
degree "P hown in Table 3. With 2 and 116 

ees of freed 8.40 | 5.35 | 6.34 | 4.90 
Proximatel eedom as shown, an F of ap- Low $08 | 6.78 | 6.69 | 3.60 
At the 05 D is required for significance a 9.08 | 7.60 | 7.03 | 4.07 

€vel and of approximately 4.80 


90 HERBERT J. KLAUSMEIER AND JOHN F. FELDHUSEN 


rected; every correct response also was 
ealled to the child's attention with words 
as above and as soon as a correct response 
was made, the next of the 10 items was 
started (for further details, see Feldhusen, 
1958). Thus, all ten of the possible ten 
correct responses in each task were rein- 
forced. 

Since the interval between the first pres- 
entation of the counting task and the addi- 
tion task was approximately 12 weeks, it 
was felt that little intertask effect would 
occur, ie. the counting task would not 
affect the addition task. Had such an effect 
occurred, it was assumed that the results 
of the analysis of the addition task would 
not be affected in such a way as to alter 
the pattern of means and differences since 
all of the children were taught both tasks 
with about the same time interval between 
tasks. 

Six members of a larger research group, 
including the two authors, administered 
the survey-teaching-acquisition-retention 
sequence. To assure uniformity, a small 
number of children were brought into the 
learning laboratory as instructional sub- 
jects for the research team. Weekly meet- 
ings were also held to clarify procedures 


during the six months the data were gath- 
ered. 


RzsurTS 


The means and standard deviations of 
the acquisition, 5-minute retention, and 6- 
week retention scores in the counting and 
the addition tasks are presented in Table 
1. The mean acquisition scores of 7.55, 9.52, 
and 9.65 in counting and of 6.28, 5.70, and 
7.38 in addition suggest that, though the 
task level may have been quite accurately 
identified for each child both in counting 
and in addition, counting a series of ten 
items and adding a series of ten items at 
the identified task level was not equally dif- 
ficult, according to IQ levels. The high IQ 
group shows the highest mean acquisition 
in both tasks; however, the low IQ group 
is lowest in counting (7.55), while the 
average IQ group is lowest in addition 
(5.70). It was previously noted that 38 
of the average and 33 of the high IQ group 
had the same task level in addition—the 
adding of yards, feet, and inches. These 
71 individuals thus had ten identical addi- 
tion items to learn, but the high group 
learned more items than did the average. 

Partly because the mean acquisition 
scores were not the same for the three IQ 
groups, analysis of covariance (Edwards, 
1950) was used in testing the hypothesis. 
Also, analysis of covariance was used be- 
cause of the high correlations between 2C 
quisition and 5-minute retention scores, 35 


TABLE 1 


MEANS AND STANDARD Devratio 
6-Werx RETENTION S 


NS OF Acquisition, 5-MiNUTE RETENTION, AND 
CORES IN COUNTING AND ADDITION 


Counting 


Addition 
1Q Group Acq. (Min Ret.J6-Wk. Ret| Acq. —_|5-min. Ret|o-Wk. Ret 
Mejo j| elä) elele l| wle 
a 
40 Low IQ 7.55| 2.53| 7.35| 2.90| 5.28| 3.11| 6.28| 2.78| 6.18! 2.92] 4.80 ME 
40 Av. IQ 9.52) 1.34) 9.50] 1.14| 7.10| 2.90| 5.70| 2.36] 6.00] 2.57 3.15 I^ 
40 High IQ 9.65! .94| 9.05! .73| 7.98| 2.77| 7.38| 1.90| 7.88| 1.81] 5.23| 3- 


RETENTION AND LEARNING ABILITY 91 


TABLE 2 
CORRELATIONS BETWEEN ACQUISITION AND RETENTION SCORES 
IN COUNTING AND ADDITION 


Counting Adding 
IQL 
Q Level N Acquisition | Acquisition | Acquisition | Acquisition 
and 5-Min. and 6-Wk. and 5-Min. and 6-Wk. 
Ret. Ret. Ret. Ret. 
rod 40 .76 .39 .89 .63 
Hi Lago 40 .66 -23 .87 .27 
Tot 1 40 .63 .29 .89 .07 
ii 120 .79 .42 .89 .56 
TABLE 3 


ANALYSIS OF COVARIANCE FOR ACQUISITION, 5-MiNUTE RETENTION, AND 6-WEEK 
RETENTION MEASURES IN COUNTING AND ADDITION WITH ACQUISITION 
THE COVARIATE 


Analysis Area 


Source of Variation 


Mean pa 
df Square z 


byx 


Acquisition &5-Min. 


et.: Counting 


Acquisition & 6-Wk. 
et.: Counting 


Acquisition & 5-Min. 


et.: Addition 


Acquisiti 
tion & 6-Wk. 
Ret.: Addition 


Adjusted means 
Within groups 
Total 
Adjusted means 
Within groups 
Total 
Adjusted means 
Within groups 
Total 
Adjusted means 
Within groups 
Total 


6 2.67 
7 


2 4. 
1. .768 


4 
6 
2.64 
.516 


3.24 


116 -916 


118 
2 18.77 3.60 
116 5.22 -598 


118 


aTa 
Of signif 


shown in 
relations 
cedures y, 
Measurin: 
10N wer 


Variance fo 
tention 
dition are sh 
°Srees of 
Proxima, 


tely 
at the ly 


Table 2. These relatively high cor- 
Suggest indirectly that the pro- 
Sed in surveying, teaching, and 
8 Acquisition and 5-minute reten- 
io 9 quite accurate. 

Tesults of the four analyses of co- 
T the acquisition and two re- 
Measures in counting and in ad- 
own in Table 3. With 2 and 116 
Teedom as shown, an F of ap- 
3.08 is required for significance 
level and of approximately 4.80 


; cignificance at the .05 level, F must equal or exceed 3.08; for significance at the .01 level, 4.80. The .01 level 
“ance is required for acceptance or rejection of the hypothesis. 


TABLE 4 
ADJUSTED MEANS FoR RETENTION IN 
COUNTING AND ADDITION WITH 
FIRST ACQUISITION SCORES AS 
THE COVARIATE 


Counting Addition 

IQ Group = a En a 
Minute | Week |Minute Week 
Low 8.40 | 5.35 | 6.34 4.90 
Average 9.03 | 6.78 | 6.69 3.60 
High 9.08 | 7.60 | 7.03 4.67 


92 HERBERT J. KLAUSMEIER AND JOHN F. FELDHUSEN 


for the .01 level. No F obtained is suffi- 
ciently large to be significant at the .01 
level. The .01 level was required for ac- 
ceptance or rejection of the hypothesis, but 
differences at the .05 level are reported. 

'The Fs of 3.24 and 3.60 are significant 
beyond the .05 but not at the .01 level. 
Table 4 shows that for addition the ad- 
justed mean scores for 5-minute retention 
and 6-week retention by IQ groups were as 
follows: low IQ, 6.34, 4.90; average IQ, 
6.69, 3.60; high IQ, 7.03, 4.67. Table 4 also 
gives the adjusted retention means for 
counting by IQ group and for 5-minute and 
6-week retention as follows: low, 8.40 and 
5.35; average, 9.03 and 6.78; and high, 
9.08 and 7.60. 

Since no F was significant at the .01 
level, the researchers conclude that re- 
tention of arithmetic learning is the same 
among children of low, average, and high 
intelligence at a mean age of 117 months 
when the original task is graded to the 
learner's achievement level. 


SUMMARY 


The purpose of this study was to test the 
hypothesis that retention of arithmetic 
learning is the same among children of low, 
average, and high intelligence when the 
original task is graded to the learner's 
achievement level. The two graded arith- 
metic tasks were ten items in counting and 


ten items in addition at a difficulty level 
found appropriate for each child. The con- 
ditions of learning were with time and num- 
ber of items held constant in the counting 
task; but in the addition task time was 
permitted to vary to eliminate overlearn- 
ing. In both tasks, the correct response to 
each item was reinforced during initial 
learning. The interval between acquisition 
and 5-minute retention was under complete 
control of the researchers, while the in- 
terval between 5-minute and 6-week re- 
tention was under no control. The interval 
between the counting and the addition task 
was 12 weeks. Using analysis of covariance, 
the researchers found no difference among 
means of the three IQ groups significant 
at the 01 level and conclude that reten- 
tion is the same for the three groups 8$ 
hypothesized. 


REFERENCES 


Epwanps, A. F. Experimental design in pY- 
chological research. New York: Rine- 
hart, 1950. n 

FeuoHUsEN, J. F. A study of efficiency ° 
learning and retention in arithmetic 
among children of low, average, i 
high intelligence. Unpublished doctor? 
thesis, Univer. Wisconsin, 1958. a 

McGrocu, J. A, & Imow, A. I. The psU- 
chology of human learning. New York: 
Longmans, 1952. 


Received October 27, 1958. " 


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 


Volume 50 


June, 1959 


Number 3 


G =| H 
OAL-SETTING BEHAVIOR AS A FUNCTION OF SELF-ACCEPTANCE, 
OVER- AND UNDERACHIEVEMENT, AND RELATED 
PERSONALITY VARIABLES 
JAMES V. MITCHELL, JR. 


University of Tezas 


guise been doubted that goal- 
the ps havior is strongly influenced by 
elin nization of personality, and sev- 
ott inei ooi have attempted to ferret 
tiat at variables of personality 
setting b h major determinants of goal- 
1035; G ehavior (Cohen, 1954; Frank, 
1940. i 1940; Gould & Kaplan, 
not etin: 1945; Sears, 1941). This has 
Sones Ps been an easy task, and it be- 
interactions to most that the actions and 
setting E of variables influencing goal- 
subject rie quite complex and not always 
ent stud simple explanations. In the pres- 
5 Senge it was not the author's purpose 
that nime all of the personality variables 
` Tather s goal-setting behavior, but 

Setting are the patterns of goal- 
e Cr scq and related personality 
ematical] ies of groups that were sys- 
Variables y ae with respect to the two 

fe: of self-acceptance and under- 

erachievement. 


: PROCEDURE 
Subjects 


Ss wi 
we ere 100 female college students who 


Te majori E 
ary Sir in elementary and second- 
men or Rh and who were either fresh- 
investigation omores at the time of the 
Measures 

The 

oal-setti 

Was s Soal-setting behavior of these Ss 


tudied ; 
d in the aetual classroom situa- 


93 


tion. Ss were all members of the author’s 
televised course in elementary educational 
psychology and hence received exactly the 
same instruction, except for short discus- 
sion periods led by graduate assistants 
(Miami University, 1957 ; Mitchell, 1958). 
Objective examinations were administered 
at the end of the fifth, eleventh, and six- 
teenth weeks, and immediately before each 
examination the subjects were asked the 
following questions in the order indicated: 
(a) What grade can you reasonably hope 
to attain on this examination? (b) What 
grade do you actually expect to get? 

At the very beginning of the course the 
Ss had been asked the same questions for 
the entire course, and this was also avail- 
able for analysis. Also recorded for each 
S was the actual grade received on each 
examination and for the course as à whole. 
Ss were allowed to use “4” and “—” in 
setting their goals, and values from zero 
through nine were assigned to the letter 
grades A and higher, B+, B, B-, C+, C 
C-, D+, D & D—, and F respectively, 
the lower values being assigned to the 
higher grades. With these numerical con- 
versions one could represent the difference 
between an S’s level of aspiration an 
actual grade obtained by subtracting the 
corresponding value of her actual grade 
from the value for her level of aspiration, 


LA = B or 2; actual grade = &— 
= —2. Other operations 
performed. In 


eg. 
or 4; difference 
similar to this were also 


94 JAMES V. MITCHELL, JR. 


computing grades for the course the au- 
thor made certain that the proportions 
of Ss receiving grades in each of the ten 
categories described above were the same 
for every examination, thus keeping this 
factor constant throughout the entire 
course. Both grades and aspiration esti- 
mates had reasonably normal distributions 
for each examination, although the means 
of the aspiration estimates, of course, were 
consistently higher than the comparable 
means of the actual grades obtained. 

“Self-Acceptance” was measured by 
means of Bills’ Index of Adjustment and 
Values (Bills, 1951; Bills, 1953). The 
original Index consists of 49 adjectives, 
and for each of these the S is to rate him- 
self on a five point scale with respect to 
(a) how much of the time the trait is 
characteristic of him (eg., “optimistic”), 
(b) how the S “feels about” himself as 
described in the first operation (ie. I 
-..[l]—very much dislike . . . [2]—dislike 
...[8]—neither dislike nor like...[4]— 
like...[5]—like very much.. .being as 
I am in this respect), (c) how much of 
the time the S would like to have more 
of the trait in question (again on a five- 
point continuum). To this original Index 
were added three new adjectives related to 
desire to achieve: “industrious,” “hard- 
working,” and “motivated.” 

In computing an index of under- and 
overachievement the ACE scores for all 
Ss were normalized and then converted 
to standard score form. The numerical 
grade average for each S for the semester 
in which the study was conducted was 
also computed, and these were also nor- 
malized and converted into standard score 
form. The index of under- and over- 
achievement for each S was then defined 
as the S’s Z score for her semester grade 
average minus her Z score for her ACE 
performance. 

To analyze the related personality char- 
acteristics of the variables described above, 
a personality inventory was administered 
that included all of the items of the Taylor 


Manifest Anxiety Scale (Holtzman, 1952; 
Taylor, 1953), some additional MMPI 
items concerned with general metal health, 
and some items of the author’s own devis- 
ing that were related to the student's de- 
sire to achieve. 


Design 


À preliminary analysis of data in which 
the Bills’ “self-acceptance” scores were 
plotted against the indices of under- and 
overachievement yielded a nonsignificant 
correlation close to zero but revealed à 
“butterfly” configuration with significantly 
greater variability in achievement-intelli- 
gence ratios for the “self-acceptant” and 
“self-rejectant” extremes of the Bills’ scale 
than for the moderately acceptant middle 
values. On the basis of these findings it 
was hypothesized that the underachievers 
and overachievers at either end of the 
Bills’ scale were psychologically distinct 
groups that would exhibit different kinds 
of goal-setting behavior and different kinds 
of related personality characteristics. 

A “self-acceptant” group (N = 60) and 
a “self-rejectant” group (N = 40) were 
then selected from either extreme of tbe 
Bills’ distribution on the basis of observ- 
able differences in variability betwee? 
them and the more constricted middle 
group, and only these extreme groups Wet? 
used in the major analysis, Ss in the “self 
acceptant” group had total scores of uy 
and above on the Bills’ “self-acceptance 
scale, while the “self-rejectant” group b? 
total scores of 149 and below. Both of 
these groups were further subdivided into 
“overachievers” and  "underachievers» 
with those having positive index values 
assigned to the former group and those 
with negative index values assigned to the 
latter. There were thus four separat? 
groups subject to the analysis: (a) the 
self-acceptant. underachievers (N = 31» 
(b) the self-acceptant overachievers (N 7 
29), (c) the self-rejectant underachieve™ 
(N — 16), and (d) the self-rejectant es 
achievers (N — 24). The median AC 


= 


GOAL-SETTING AND PERSONALITY 95 


percentile ranks for each group, based 
upon the national norms, were 70, 55, 79, 
and 35, respectively. A two-way analysis 
of variance with unweighted means was 
Applied to the normalized Z scores of 
the ACE and yielded an F of .86 (N. 8.) 
between the self-acceptant and self-re- 
Jectant groups and an F of 21.95 between 
the underachievers and overachievers, an 
expected difference which was significant 
» "5 001). The numerical grade point 
pm for the semester in which the 
sudy was conducted, based upon a con- 
Version system for which A — 4, B — 3, 
i 2, ete., were 2.14, 2.92, 2.34, and 
E. Suespeeitmttr When the same analy- 
Fl varianee technique was applied to 
fe data, the resultant F of 01 be- 
an e self-acceptant and self-rejectant 
oe was not significant, while the F 
a the underachievers and over- 
ie ers was 26.50, another expected dif- 
oe e that was significant (p < .001). 
pon actions for both the ACE and the 

© point averages were insignificant. 

or each of the four subgroups of the 


study the followi : 
oll - 
secured: owing goal-setting data were 


il 
exam, enn level of aspiration (LA) for each 
can SAPE (from responses to "What grade 
$ reasona ] $ 
examination?) ee hope to attain on this 
_2.P 
ing tlie of students actually achiev- 
Ration levels of aspiration for each exami- 
on 
each Mean grade expectancy (Gr. Exp.) for 
Erade ER (from responses to “What 
examination?) ally expect to get on this 
S's level of of the differences between each 
ie, LA t aspiration and her expected grade 
,5. Mean Gr. Exp.) for each examination. 
S's leve] of of the differences between each 
Obtained i aspiration and her actual grade 
Ration, `S» LA — Gr.) for each exami- 


gê Mean of the di 
5 expected ard. differences between each 
grade and her actual grade ob- 


tai A 
ace Le. Gr. E; 2 
lon, - Exp. — Gr.) for each exami- 


8 7. Mean of 


8 actual E the differences between each 


‘ade obtained on the last exami- 


nation and her level of aspiration for the 
next examination (i.e., Gra — LA: and Gra — 
LAs). 

8. Mean level of aspiration for the course 
as a whole, taken at the very beginning of 
the course (LA.). 

9. Mean of the differences between each 
S's level of aspiration for the course and the 
final grade received for the course (i.e, LAc 
— Fin. Gr). 

10. Percentages of students actually 
achieving their levels of aspiration for the 
course as a whole. 


All four subgroups were also compared 
with respect to their performance on the 
personality inventory and the self-de- 
seriptive section of the Bills’ scale. In the 
former case the percentages of Ss re- 
sponding “Yes” to each item were com- 
puted for each group. Items showing large 
intergroup differences were subject to 
tests of significance and further analysis. 
In the latter case, choices for the self- 
descriptive operation were dichotomized 
into two groups, with choices 1-Seldom 
is this like me, 2-Occasionally this is like 
me, and 3-About half of the time this is 
like me being placed in the one group; and 
choices 4-A good deal of the time this is 
like me, and 5-Most of the time this is 
like me being placed in the other group. 
Percentages of Ss in each of these two cate- 
gories were computed for the four sepa- 
rate subgroups. Items showing large inter- 
group differences were subject to tests of 
significance and further analysis. 


RESULTS 


Table 1 reveals the goal-setting charac- 
teristics of the four subgroups. The num- 
bered subscripts 1, 2, and 3 refer to the 
first, second, and third examinations of 
the course, while the letter subscript ^c" 
refers to aspiration estimates or grades for 
the course as a whole. A two-way analysis 
of variance was employed for all compari- 
sons, using the method of unweighted 
means as described by Snedecor (1956, p. 
385). Interaction effects were omitted 
from the table, since the only significant 


96 JAMES V. MITCHELL, JR. 


interaction was that for LA,, which had 
an F of 430 (p « .05). Since some pre- 
vious studies indieated the possibility of 
significant differences in LA variances be- 
tween the self-acceptant and self-rejectant 
groups and between the underachieving 
and overachieving groups, the F test was 
applied for both of these comparisons for 
LA,, LA,, LA;, LA, — Gra, LA, — Gra, 
LA; — Grs, Gr. — LA, , and Gra — LA,. 
With but one exception no significant dif- 
ferences in variances appeared for any of 
these comparisons. The one exception in- 
volved LA, , where it appeared that over- 
achievers were significantly (p < .05) 
more variable in setting their first level of 
aspiration than underachievers. 
Inspection of Table 1 discloses some in- 
teresting differences in goal-setting be- 
havior for the four subgroups. Within the 
LA — Gr. category it can be observed that 
the overachievers come significantly closer 
to their level of aspiration than the under- 
achievers for Exams 1 and 3, for the Total 
LA — Gr. averages, and for the course as 
a whole, and that on Exam 3 the self-re- 
jectant group comes significantly closer to 
their level of aspiration than the self- 
acceptant group. Although the latter trend 
is observable for Exams 1 and 2 as well, 
it is not statistically significant for these 
exams, a eireumstance which can probably 
be related to the considerable discrepancy 
in LA — Gr. for the self-rejectant under- 
achiever in the early part of the semester 
and the subsequent progressive decrease 
in LA — Gr. discrepancy toward the end 
of the semester. This progressive decrease 
is, in turn, probably attributable to the 
fact that the self-rejectant underachiever 
readjusts her LA downward relatively 
more often and more severely than the 
others during the semester, as can be ob- 
served from the LA data. This is in 
contrast to the expected grades, which re- 
main fairly constant throughout the semes- 
ter for all groups. . 
Differences between the goal-setting be- 
haviors of the sef-acceptant and selí-re- 


jectant groups are even more clearly re- 
vealed for the Gr. Exp. — Gr. data. Here 
it is evident that the actual exam grades 
obtained are consistently lower than the 
expected grades for the self-acceptant 
groups, while the self-rejectant groups 
either have less of a negative discrepancy 
or actually obtain grades higher than ex- 
pected, a difference which is significant at 
the .05 level for Exams 2 and 3 and for 
the Total Gr. Exp. — Gr. averages. It is 
also interesting to note that the order of 
magnitude of the Gr. Exp. — Gr. discrep- 
ancies is always the same: self-acceptant 
underachiever (greatest), self-acceptant 
overachiever, self-rejectant underachiever, 
and self-rejectant overachiever (least)- 
Some other interesting differences até 
revealed by the Gr. — LA data. The dis- 
crepancy between present level of aspira- 
tion and previous grade obtained is S£ 
nificantly higher for the underachievers 
ior Gr, — LA,. When next computed, 
however, the picture is somewhat different, 
for with Gr, — LA, the self-rejectant 
group now has a significantly smaller dis- 
crepancy between present LA and prev 
ous grade than the self-acceptant grouP- 
Both trends are present for both sets © 
data, but each attains statistical signifi- 
cance only once. The reason for this ca? 
be traced to the behavior of the self-r°- 
jectant underachievers and the self-2e- 
ceptant overachievers. The self-rejecta? 
underachievers are the only ones for who™ 
the Gr. — LA, discrepancy is less ie 
the Gr, — LA, discrepancy, despite bon 
fact that they shared with others an aP 
preciable increase in percentage of D 
obtained from LA, to LA,. Quite in COP 
trast is the considerable increase e 
Gra — LA, to Gr, — LA, for the self 
acceptant overachievers. These s 
changes are sufficient to change the pie 
ture of statistical significance. the 
Inspection of Table 1 also discloses 
following additional information: 


1. There is a general trend for LA: 
lower than LA; , probably as a result O' 


to be 
f the 


GOAL-SETTING AND PERSONALITY 97 


e TABLE 1 
o 
Gr tags or Tap Goar-SEeTTING BEHAVIOR OF SELF-ACCEPTANT ÜNDERACHIEVERS, 
ELF-ACCEPTANT OVERACHIEVERS, SELF-REJECTANT UNDERACHIEVERS, AND 
SELF-REJECTANT OVERACHIEVERS 


Self-Acceptant Self-Rejectant 
o ; F for Selí- | F for Under- 
pu ; or A ara AEE Teeni 
Pur E Fide came Mts = A ii 
LA, 147 1 1.13 1.35 | 1.99 .82 
pee 2.73 1.54 2.27 2.43 ES 2.51 
LA: 3.13 1.82 3.00 2.70 | .91 4.21* 
: 2.57 1.99 3.07 9.22 | 1.79 4.18* 
Pu 7 Gr. Exp =1.18 —.75 | —2.13 | —1.04 | 6.40* 8.31** 
‘Aa — Gr. Exp.; “103 | -114 | -1.87 | —1.17 | 3.25 1.50 
Aa — Gr. Exp.s iio | —.89 | —133 | —-91 E 1.99 
LA - Gra mi | -1.36 | —2.47 | —1-04 | .04 7.32** 
LA!- Gra “Teo | —1.32 | —160 | —-39 | 1.02 2.62 
qs = Gri Zaat | —1.21 | —1.88 | —-70 | 5.02* 6.59* 
otal LA — Gr, —6.40 —3.89 —5.40 —2.3 | 2.35 11.86** 
(Exams 1 + 2 + 3) 
LA. — Fin, Gr. -an | 179 | 3.38 | —1.687 | 80 25.56** 
# Achieving LA, 13 43 7 35 
& Achieving LAs 33 36 47 57 
9 Achieving LAs 10 39 20 6l 
76 Achieving LA, 0 25 7 30 
Gr. Ex 
eet = Gr. —1.20 —.61 —.83 .00 | 2.81 1.08 
dr Exp. — Gris —.67 | —.18 127 WS | 4.30* 1.08 
Tours = Gra a7 | =e ‘00 729 | 5.91* 2.61 
(au Gr. Exp. — Gr. | -3.14 | 2.11 | —.06 100 | 8.71** | 2.90 
Xams 1 +2 E 3) 
Gr, * 
1 ~ LAs 1.93 1.07 1.78 73 | 3 4.81 
Sta — Ls 2.17 1.75 1.53 ‘s7 | 4d2* | 2.08 
n 
wel Grade 4.98 2.50 4.46 2.91 .00 19.81** 


Noi -— i 
“on te.—Numerical subscripts 1, 2, and 3 refer to the 1st, 2nd, and 3rd examinations of the course; letter subscript 
a 


n to the course as a whole. 
Y to abbreviations used: 
a Level of Aspiration 
as p Grade Expectancy 
s rude Actually obtained 
" Signifa T.: Final Grade in course 
se nificant at .05 level. 
‘nificant at 91 level. 


nig, , 2 
This is Li roportion of Ss attaining LA:. exception already noted i that for the self 
Centa Owed by an increase in the per- rejectant underachiever./ $ 

ges Tr a E for the course 18 
lowed) Of Ss attaining LAs, which is fol- 2. The level of Des though no 


in turn by an increase in LAs. (An always higher than LA:, 


98 JAMES V. MITCHELL, JR. 


formal evaluation has occurred in the mean- 
time. Aparently just a few weeks of course 
work encourages more realistic goal-setting. 

3. The self-rejectant overachievers always 
have the least discrepancy between. level of 
aspiration and grade actually obtained. 

4. The self-rejectant overachievers are 
consistently most likely to receive an exam 
grade that fulfils or surpasses their stated 
expectations. r 

5. The self-rejectant overachievers always 
have the least discrepancy between previous 
grade and present level of aspiration (which 
partially explains 3 and 4 above). 

6. The self-acceptant underachievers al- 
ways have the greatest discrepancy between 
previous grade and present level of aspira- 
tion. 

7. The self-acceptant underachievers al- 
ways have the greatest discrepancy between 
expected grade and grade actually obtained. 

8. For the first exam the discrepancy be- 
tween level of aspiration and grade expect- 
ancy is significantly greater for the under- 
achievers as opposed to the overachievers 
and significantly greater for the self-rejectant 
as opposed to the self-acceptant. Though 
these trends are observable again for the 
second and third exams, they are not sig- 
nificant for these, primarily because of the 
progressive decrease in LA — Gr. Exp. dis- 
crepancy for the self-rejectant under- 
achievers, The latter, however, are observed 
to have the greatest discrepancy for all three 
exams, despite this Progressive decrease, 


Tables 2, 3, and 4 show some of the 
personality characteristics associated with 
membership in each of the four separate 


TABLE 2 


Scores on THE TAYLOR MaNrrEsT 
ANXIETY SCALE ror THE Foun 


SUBGROUPS 
F Test for | F Test for 
Self- Self- nder- and 
Self-Acceptant Rejectant Acceptance jer 
Rejection ment 
gw ele Ss 
$| šla] i|, 
23 | 38/82] 33 ? d 
8^ |^ | 8^ |à 
11.55 |14.72,19.88,23.29/31.18|.001]4.72| .05 


Note.—The interaction effect was insignificant and 
therefore omitted. 


Subgroups. Table 2 indicates that scores 
on the Taylor Manifest Anxiety Scale 
were significantly higher for the selí-re- 
jectant Ss as opposed to the self-acceptant 
Ss and for the overachievers as opposed 
to the underachievers. The cumulative ef- 
fect of these two trends is to reveal the 
self-rejectant overachiever as the highest- 
ranking on the anxiety scale and the self- 
acceptant underachiever as the lowest. It 
is interesting to note that the order of 
magnitude for these scores is exactly the 
same as that for the Gr. Exp. — Gr. data 
for all three exams, and also for the Gra — 
LAs data as well. 

Table 3 shows those items in the pera 
sonality inventory that produced statisti- 
cally significant discriminations among t 
four subgroups of the study. For each 0 
these items the numbers of Ss answering 
“Yes” and “No” were computed for each 
subgroup, and a chi square value was 
computed for the resulting 2 by 4 table. 
Table 4 shows those items in Bills’ Ines 
of Adjustment and Values that produce 
Statistically significant discriminations 
These same operations had been carne 
out for the Bills’ scale, the responses bene 
dichotomized as described in a piens 
section. However, certain problems aros 
with the Bills’ scale that ped EY 
somewhat different type of analysis. 
was evident that statistically significa 
differences would appear for items on k x 
Bills' scale by virtue of the fact that t T 
self-acceptant Ss would naturally 1% E 
themselves higher on positive traits i: 
would the self-rejectant Ss. The saM 
trend is also observable for the data ° 
Table 3, but here the effect does not seem 
to be as great. But for the Bills’ scale Í 
was decided to rank the items within LenS 
group according to the percentage of po 
tive responses (Alternatives 4 & 5) ae 
then to add these rank order data to * 
table for interpretive purposes. J 
ranked data constitute a self-represent?" 
tion by each group in terms of character" 
ties seen as more typieal or less typical 


GOAL-SETTING AND PERSONALITY 


INS TABLE 3 
ENTORY IrEMS Havine SIGNIFICANT Cni SQUARE VALUES FOR THE Four SUBGROUPS 


99 


Percentage Answering "Yes" 


Item Self-Acceptant Self-Rejectant x: 
Under- Over- Under- Over- 
Achievers | Achievers | Achievers | Achievers 
E I'm faced with difficult task 
eee “give up” too easily. 10 7 50 17 15.37** 
hs I were more successful as a 
endent: 90 66 94 92  |10.27* 
qr to do well in anything I un- 
ertake, 97 90 56 92  |16.30** 
i more sensitive than most 
fuer people. 20 46 38 58 8.72* 
ap dently find myself worrying 
fi out something. 47 69 50 79 7.53% 
aai the standards I set for my 
n work are usually too high for 
d attain, 40 14 19 50 10.26* 
Sh I could be ash th 
seem TATA as happy as others " T S 5 Jurere 
M usuall i 
: weet, y calm and not easily 97 79 56 46 19.61** 
Suall expect to succeed in 
things T do, 87 100 88 75 7.788 
pue periods of such great rest- 
5 ness that I cannot sit long in d 
I ha, air, 30 21 56 54 9.48 
culties metimes felt that diffi- 
were piling up so high 
I are ud not overcome them. 50 55 69 79 8.39* 
ard to k ind on 
I A task or job, een may OH 16 14 63 38 17.09** 
Life ig Digh-strung person. 0 21 31 38 13.26* 
h Strain f 
1 time, in for me much of the f 7 a 33 14.74** 
ave stron ivati # 
Seed in ie laine to suc T rr ^t 92 12.09** 
es I thi 
all. ink I am no good at A a Fal n 18.00** 
am certaj : " 
confidence. 7 lacking in self- M 24 50 75 21.05** 
I prs C? rely self-confident. 40 24 19 oe 
PI ne ? good deal. 1 10 6 29 8.3 
ieee be able to live up to my 9.61* 
am lik, p pectations for me. 7 10 38 29 4 
Ga 
know xot by most people who - 97 si 88 1.13» 
discriminating 


A 
Ep - s " * 
the subg, roaches statistical significance where pos = 7.82; included here for its interpretive value in 


Beers. 
se grnifleant at .05 level, 
"nificant at 91 level. 


100 


them as individual members of the group. 
Three of the traits—"optimistie," “stub- 
born," and "competitive"—were included 
in Table 4 despite the fact that they were 
not statistically significant, because it was 
anticipated that these three would exhibit 
appreciable intergroup differences in rank 
even though statistical significance was ab- 
sent. This proved to be the case for two 
of these three traits. 

Inspection of Table 3 along with the 
ranked data of Table 4 discloses some in- 


JAMES V. MITCHELL, JR. 


teresting differenees between the four 
subgroups. The self-aeceptant under- 
achiever rates herself higher than the 
others on such traits as happiness, self- 
confidence, alertness, being less sensitive 
than others, worrying less, being calm and 
not high-strung or easily upset, and not 
finding life much of a strain. But she rates 
herself low on ambition. The self-accept- 
ant overachiever rates herself high on ma- 
turity, worthiness, competence, and in- 
dustry, considers herself well-liked, and 


TABLE 4 
IrEMS IN THE "INDEX OF ADJUSTMENT AND VALUES" HAVING SIGNIFICANT Cur SQUARE 
VALUES FoR THE Four SuBGROUPS 


Percentage Choosing Alternatives thi 
ua” and “5” Item Rank Within Group 
Tini Self-Acceptant Self-Rejectant x: Self-Acceptant Self-Rejectant 
Under- 
icu |, Over- | Under- | Over- C hiev 
Achiev- | achievers | Achievers | Achievers Achievers | Achievers | Achievers | Mani 
a 
aeceptable | 100 | 90 87 46 |29.55** | 1 5 1 9 
alert 90 72 25 37 27.309*| 5 14 15.5 | 14 
ambitious 56 79 31 58 | 11.58** | 20 13 11.5 | 4 
calm 81 | 69 62 37 |13.98**| 9 15 3.5 | 14 
competent 81 86 31 37 |19.34* | 9 7.5 | 11.5 | 14 
confident, 81 | 66 19 12 |34159* | 9 16.5 | 19.5 | 23 
helpful 81 | 93 50 51 | 15.48%* | 9 3 5.5 | 6.5 
logical. T | 66 | 25 | 42 |1422] 12.5 | 16.5 | 15.5 |11 
industrious 58 ; $ 5 
86 25 33 |19.47** 17 
47) 18.5 | 7.5 | 15.5 
merry 90 | 86 62 58 |1112 | 5 7E | E I4 
mature 90 | 97 37 37 |36.2| 5 i 8.5 |14 
nervous | 10 | 14 19 | 37 | 7.609 | 23.5 9.5 |14 
optimistic 61 3s - 
pini 62 50 29 | 6.00 | 16.5 | 19.5 | 5.5 | 19 
Lo cis n 7 37 29 |11.99** | 14.5 | 19.5 8.5 |19 
2 *: 
eens 8 | 19.05** | 21.5 | 22 22.5 | 24 
ing 61 | 83 19 46 |18.86** | 16.5 9 
stable 93 | 83 44 67 | 16.93**| 2.5 ur 7 13 
chdious 42 | 62 6 46 | 13.21**| 21.5 | 19.5 | 24 9 
successful "1| 8 31 21 |2117**| 145 | 12 11.5 |215 
stubborn 10 | 17 19 21 | 1.64 | 23.5 | 23 19.5 | 21-5 
worthy 77 | 98 31 29 |32.75*| 12.5 | 3 11.5 | 19 
broad- i 
minded 93 | 93 69 62 |1331'*| 25 | 3 2 2 
competitive | 58 | 62 | 25 | 58 | 6.56 | 185 | 195 | 155 | 4 
motivated | 81 | 86 12 54 29.69** | 9 T5 | 22.5 | 65 


^ Approaches statistical significance where p.o = 7.82; included here for its interpretive value in discriminati?* 


the subgroups. 
* Significant at .05 level. 
** Significant at .01 level. 


GOAL-SETTING AND PERSONALITY 101 


Tates herself extremely high on her ex- 
Hes "to succeed in things I do." The 
lewo ete underachiever rates herself 
too easil To and high on "giving up" 
ilis in A he is more likely to report that 
eis = studious or “hardworking.” She 
it hard gay | to report that she finds 
that gi m eep her mind on a task and 
all = times she thinks she is no good at 
and mE of this she ranks calmness 
Ereann high in her self-ratings. The 
i m ant overachiever rates herself 
dd ee motivation in school- 
OEE ua e competitiveness, nerv- 
OF te ea and tendency to brood 
Bee da he rates herself low on confi- 
maturity Maine. Suecess, acceptability, 
e p nd worthiness. She is also more 
strung de e unhappiness, being high- 
whelmed ed upset, feeling over- 
too high ^s, difficulties, setting standards 
in things : attain, not expecting to succeed 
much af Ee does, and finding life a strain 
fter he time." 

Tanked n data of Table 4 had been 
seemed ES facilitate interpretation, it 
seh itful to take one further step 
correlations 4 intergroup rank-difference 
Similarity a to determine the degree of 
Ings of the m difference between the rank- 
Shown in "uns groups. These data are 
here is th able 5. One notable difference 
i at the self-rejectant overachiever 


east sim; 
Selecteg it ag to all the others for these 


se € 
» Ambition, studiousness, feelings of 
: Teme in pe es immaturity, 
"lecess fec]: of self-confidence, lack of 

Ings, and nervousness and anxi- 


Lb. ss 

Achia er Sid that the self-rejectant over- 
i aracteristi CS stability as the trait most 
en th a of herself, and one suspects 
n itivates i ES 80 emotionally loaded that 
aj hose. who * strongest defensive reactions 
unth ficient) are the most unstable. It is 

ink; Y nondescriptive to encourage 


SPonses E 
ses, (and generally positive) re- 


1 
H 


TABLE 5 


INTERGROUP CORRELATIONS FOR THE 
RANKED Data or TABLE 4 


Self- Self- 
Acceptant | Rejectant 
5| g| EIS 
Al alal 3 
85/45/55] 23 
#4 $4 | Ee] 82 
Bé |p iS 
Self-Ac- Under- 
ceptant | Achievers .71| .65| .40 
Over- 
Achievers -52) .39 
Self-Re- Under- 
jectant | Achievers .33 
Over- 
Achievers 


ety. These data should be considered in 
the light of the fact that the self-rejectant 
overachievers are definitely a below aver- 
age group intellectually, having a median 
ACE percentile rank of only 35. 


Discussion 


Data have been presented concerning 
the goal-setting patterns of the four sub- 
groups and also the personality charac- 
teristics distinguishing the subgroups. An 
attempt will now be made to interpret 
these empirical relationships and portray 
their meaning for each of the groups in- 
volved. 

Some significant trends are evident for 
the data as a whole. By inspection it is 
evident that self-rejection and manifest 
anxiety are related. (An earlier investiga- 
tion revealed a significant correlation of 
Al between these two variables.) Both of 
these are related in turn to à cautious and 
conservative pattern of goal-setting, the 
most anxious, self-rejectant Ss setting 
their new LAs closest to the grades ob- 
tained on the previous exam. It is prob- 
ably because of this that these 5s more 
often exceed their stated grade expecta- 
an is typical of the less anxious, 


tions th 
subjects, whose actual 


self-acceptant 


102 JAMES V. MITCHELL, JR. 


grades are characteristically lower than 
their stated grade expectations. One ex- 
planation for this is that the self-rejectant 
person tends to be relatively unconfident 
and pessimistie when predicting her grade, 
and she is so conservative in her predic- 
tions that she often performs better than 
she expects. The self-acceptant person, on 
the other hand, may expect more and 
generally get less. An alternative but re- 
lated explanation would be that the al- 
ready anxious, self-rejectant person is 
consciously or unconsciously aware that 
critical nonattainment of goals would 
create additional tension and anxiety with 
which she could not cope, and to prevent 
this from occurring she lowers her aspira- 
tions accordingly. It is probable that both 
of these operate in some measure. 
Operating within this general trend are 
several specific tendencies that can be 
noted for each of the separate subgroups: 


The Self-Acceptant Underachiever: There 
is a strong suggestion in the data that the 
self-acceptant underachiever either fulfils 
her ego needs in areas other than the aca- 
demic or that she can somehow achieve self- 
satisfaction and contentment more easily 
(and with less justification) than the average 
person. Her calm, unworried, self-confident 
kind of temperament permits her to aspire 
to grades that are further beyond actual 
previous attainment than those for any other 
group. However, she is not especially am- 
bitious or willing to exert much effort in 
order to achieve her goals, and she therefore 
also falls further short of her expectations 
than those in any other group. She often 
recognizes that she has not attained the 
standards she set for herself, but nonattain- 
ment is apparently of little concern to her, 
since she is the least anxious of all those 
represented in the study. Doubtlessly she is 
well-adjusted, but she falls far short of realiz- 
ing her potentialities as an individual. 

The Self-Acceptant Overachiever: The 
self-acceptant overachiever can probably 
best be characterized as the well-adjusted, 
“good” student. Although often of mediocre 
intelligence, she consistently aspires to the 
highest grades, and her actual grades are 
generally the highest obtained by any of the 
subgroups. She is a hard working, industrious 

person whose diligence has assured success 


in the past and who therefore just expects 
further success in the future. She probably 
obtains a large measure of ego-satisfaction 
through her academic work. She can put 
pressure on herself when she needs it most; 
toward the end of the course she considerably 
increased the gap between the last exam 
grade and her current LA and then went on 
to come as close or closer to this new aspi- 
ration than she ever had before. She sees 
herself as a mature, worthy, well-liked per- 
son who is competent to deal with life's 
challenges and can do so without undue 
nervousness or anxiety. Unlike the self-re- 
jectant overachiever, her motivational sys- 
tem is not characterized by a strong sense 0 
competitiveness. 

The Self-Rejectant Underachiever: The 
self-rejectant underachiever is, without 
doubt, the least motivated of all. She has the 
highest median ACE score of all four sub- 
groups; yet she is more likely to admit that 
she does not try too hard, that she tends to 
“give up” too easily, and that she cannot 
keep her mind on her work. She ranks herscl 
relatively high in “optimism,” however, 22 
both this optimism and her acknowledge! 
tendency to give up too easily are reflected in 
the manner in which she sets her goals. She 
starts out bravely enough and demonstrates 
her initial "optimism" by setting an 
that is further from her expected grade that 
those for other groups, and this LA — Gr. 
Exp. gap continues to be greatest on the 
succeeding exams. But only 7% of the self 
rejectant underachievers achieve LAs; TA 
there ensues a rather precipitous drop in L 
for succeeding exams. The drop is demo?- 
strably greater from LA; to LAs than for any 
other group, and from LA: to LAs all the 
other groups are actually increasing the!” 
LAs appreciably, while that for the self- 
rejectant underachiever continues to de 
crease. This tendency to pull down the LA, 
obviously related to present lack of mot 
vation as well as initial overestimation, ” 
also indicated in the data relating curren | 
LA with previous exam grade, where it 7" 
observed that the discrepancy increases over 
time for all the other groups but actually 5 
creases for the self-rejectant underachieve! 
This has the effect of progressively decre? 
ing the LA — Gr. Exp. and LA — Gr- di 
crepancies, but only because of the droP 7) 
LA and certainly not because of increase 
motivation and willingness to work hard. t 
a matter of fact, the evidence could mor 
reasonably be interpreted as pointing s 
progressive decrease in motivation throug 
out the semester. Perhaps it is because 


GOAL-SETTING AND PERSONALITY 


hen tS goal-setting behavior, her 
ata motivation, and her chronic non- 
üchies i: that the self-rejectant under- 
AO P at times that she is “no good 
high dante is may also help to explain her 
ake ae n oe anxiety score. Why, then, does 
Gin ae herself as relatively 
worry? Onc » and not inclined to brood or 
nderachin® suspects that the self-rejectant 
a Wien ver expends much psychic energy 
it is ye maneuvers of one kind or another; 
lu ae possible that her great LA — Gr. 
Tation ean connote level of aspi- 
im sua that is essentially defensive 
SS es so than for any other group. 
inenten. “| es surprising if a psychic life 
and fens: : Y strong repressive tendencies 
manifest a Be LieecHons were to result in 
erachievoment. and equally manifest un- 
ably tn Seli-Rejectant Overachiever: Prob- 
Presented mort depressing picture of all is 
é a, the self-rejectant overachiever. 
pe scription indicates that she is 

8 : i ob more nervous, more high- 
in the. oth more easily upset that those 
anxiety ed subgroups; and her manifest 
four sup Core, which is the highest of all 
peur confirms this self-descrip- 

eels incompetent, unsuccessful, 
and completely lacking in self- 
Joann She is often inclined to 
s r inadequacies and her troubles 
eral, Hep NA overcome by life in gen- 
Brad and lack of self-confidence 
PEE her goal-setting behavior, 
Rd ACBRUY sets her goals more cau- 
Subgro nservatively than those in the 
6 see She sets her LAs far 
others, and idus grade obtained than the 
S Pancy bet e always has the least dis- 
€ also m ween LA and grade obtained. 
Frade than BS often exceeds her expected 
ü = She is 4 Xn else. She acknowledges 
ated Mbitious, studious, highly mo- 


ed, aj 
ms madas cEanely competitive; and the 
th ting behayj this with her cautious goal- 
ie aller LA. is probably responsible for 
n enden, — Gr. discrepancy and the 
Bon a Cy to exceed grade expecta- 
T Acteristically, however, she re- 


mor 
abd the re often than ‘the others that ^I 
spu Usually 4 ards I set for my own work 
tani e Ru high for me to attain" de- 
hes ing them uat She comes closer to at- 
self D-senteq 2n anyone else! Faced with 
tejectant o feelings of unworthiness, the 
spit = crerachiever apparently spends 
Shreq Ee in a struggle to achieve 
Self-respect. This struggle 


103 


takes the form of a desperate and highly 
motivated effort to achieve academic ex- 
cellence. This must be a very difficult task 
for many members of this group, since the 
median ACE percentile rank is only 35, and 
the combination of mediocre ability and 
desperate desire for academic achievement 
must only create further tension and anxiety 
that produce additional misery in an al- 
ready unhappy life. Doubtlessly it is be- 
cause of this extreme emotional investment 
in goal-attainment that the self-rejectant 
overachiever sets her goals so conservatively 
and then tries so desperately to attain them. 
Her sense of personal unworthiness is deeply 
felt, and that is why she must put forth so 
much effort to convince herself and others 
that she is a worthy person after all. 


SuMMARY 


The purpose of this investigation was 
to analyze the patterns of goal-setting be- 
havior and related personality character- 
istics of four groups: self-acceptant un- 
derachievers, self-acceptant overachievers, 
self-rejectant underachievers, and self-re- 
jectant overachievers. Self-acceptance and 
rejection were measured by the Bills' 
Index of Adjustment and Values. Ss 
were 100 female college students enrolled 
in a course in educational psychology. 
Three examinations were given during the 
course, and before each of these each S was 
asked to indicate the grade she hoped to 
attain on the exam and the grade she actu- 
ally expected to get. In addition to the 
Bills’ scale a personality inventory was ad- 
ministered that included the items of the 
Taylor Manifest Anxiety Scale and other 
items relating to mental health and de- 
sire to achieve. Among other results re- 
ported, it was noted that self-rejectant Ss 
had a significantly smaller discrepancy be- 
tween previous grade and present level of 
d there was also an accom- 
the self-rejectant Ss to 
ted grades while the 
ently overestimated 
hievement was also 


aspiration, ani 
panying trend for 
exceed their expec 
self-acceptant consist 


theirs. Since underac! 
associated with overestimation of grade, 


the self-acceptant underachiever led all 
four groups in gross overestimation, while 


104 


the self-rejectant overachiever led the 
others in either achieving or exceeding the 
expected grade. Significant differences in 
manifest anxiety were revealed, with the 
underachievers and self-rejectant Ss ex- 
hibiting greater anxiety, a condition that 
was reflected in the more cautious and 
conservative goal-setting that character- 
ized these groups. Careful analysis of the 
items in the Bills’ scale and in the person- 
ality inventory revealed many interesting 
differences between the four subgroups. 


REFERENCES 


Buus, R. E. An index of adjustment and 


values. J. consult. Psychol., 1951, 15, 
257-261. 


Burs, R. E. A comparison of scores on the 
index of adjustment and values with be- 
havior in level of aspiration tasks. J. 
consult. Psychol., 1953, 17, 206-212. 

Conen, L. D. Level of aspiration behavior 
and feelings of adequacy and self-ac- 
ceptance. J. abnorm. soc. Psychol., 1954, 
49, 84-86. 

Frank, J. D. Some psychological determi- 
nants of the level of aspiration. Amer. 
J. Psychol., 1935, 47, 285-293. 


JAMES V. MITCHELL, JR. 


GanpxeR, J. W. The relation of certain per- 
sonality variables to level of aspiration. 
J. Psychol., 1940, 9, 191-206. 

Gourp, Rosaump, & Karran, N. The rela- 
tionship of level of aspiration to aca- 
demie and personality factors. J. $00- 
Psychol., 1940, 11, 31-40. 

Gruen, E. W. Level of aspiration in relation 
to personality factors in adolescents. 
Child Develpm., 1945, 16, 181-188. 

Houtzman, W. H., Carvin, A. D., & BITTER- 
man, M. E. New evidence for the valid- 
ity of Taylor's manifest anxiety scale. 
J. abnorm. soc. Psychol., 1952, 47, 853- 
854. : 

Miami University. Second report: Exper 
mental study in instructional procedures. 
Oxford, Ohio: Miami Univer., 1957. 1 

MrrengLL, J. V., Jr. Teaching educationa 
psychology by TV. Improving Golega 
and University Teaching, 1958, 6, 90-9 x 

Sears, PauLINE S. Level of aspiration in T°” 
lation to some variables of personality + 
clinical studies. J. soc. Psychol., 1941, 14; 
311-336. 5 

Syevecor, G. W. Statistical Methods. Ame? 
Iowa: Iowa State Coll. Press, 1956. í 

Taytor, JANET A. A personality scale e 
manifest anxiety. J. abnorm. soc. Psy 
chol., 1953, 48, 285-290. 


Received October 16, 1958. 


E-. 


JounNAL or Ep: 
EDUCATIONAL Psx Y 
Vol. 50, No, 3 ae 'SYCHOLOGY 


PREDICTING LEADERSHIP RATINGS FROM 
HIGH SCHOOL ACTIVITIES! 


JOHN D. KRUMBOLTZ 
Michigan State University 


RAYMOND E. CHRISTAL ax» JOE H. WARD, JR. 
Personnel Laboratory, Wright Air Development Center 


aps ee questionnaire about 
€ indio activity participation be used 
E later leadership peer ratings? 
Boltz ioe of the literature (Krum- 
of high E ) concerning the relationship 
teria E activities to leadership cri- 
way or ie. led no conclusive evidence one 
that eis e other. There was some evidence 
future m Activities were predictive of 
ealing ——À success, but the studies 
Such a nit h high school activities were of 
Could be M that no valid conclusions 
Enpshen rawn. More recently, a study 
de sd & Christal, 1957) at the Air 
ing Ton 27 revealed correlations rang- 
Activity jy l to 28 between high school 
ings, "loe een and leadership rat- 
determing present, study was designed to 
Participati whether high school activity 
eadersh on can be used to predict, future 
e es tone within the aviation ca- 
the rv. ion. It is an attempt to verify 
Study ue. of the Air Force Academy 
Syon ipe & à different population. How- 
Tespects T gee study differs in three 
Was diver a special inventory form 
Cords his i by which the examinee re- 
i aes E school activity participa- 
data to M. Mark Sense card. Sec- 
A s r individuals from large and 
Schools were analyzed sepa- 


[9 


` Thi 
ARDE popart is based on work done under 
Support a No. 7719, Task No. 17009, 
Warm of the research and development 
right Air D the Personnel Laboratory, 
Wag Orce Bass dopient Center, Lackland 
Sea ceived fr Texas. Additional support 
ch Grant prom an All-University Re- 
from Michigan State Univer- 


rately. Finally, individual items were se- 
lected and combined by an iterative mul- 
tiple regression technique to produce the 
highest possible relationship with the eri- 
terion. 

The establishment of any positive rela- 
tionship between high school activity par- 
ticipation and leadership criteria does not, 
of course, necessarily indicate a cause and 
effect relationship. If a successful predic- 
tion can be made, this fact has important 
implications in a selection program 
whether or not leadership ability may have 
resulted from the training received in high 
school activities. 


Sample 

The total sample consisted of 956 avia- 
tion cadets in preflight training at Lack- 
land Air Force Base, Texas. Of these, 857 
graduated from preflight training, while 
99 were eliminated for a variety of reasons. 
Since students from smaller high schools 
generally have available a fewer number 
of high school activities (although perhaps 
more opportunity to participate in those 
activities which are available), students 
from large and small schools were ana- 
lyzed separately. Each cadet was asked to 
indicate approximately how many persons 
there were in his graduating class includ- 
ing midyear and summer session gradua- 
tion. If he reported 99 or less, he was ar- 
bitrarily considered to be from a “small” 
high school. If he reported 100 or more, 
he was arbitrarily considered to be from 
a “large” high school. 

Six aviation cadet c 
in this study. The first three, Cl: 


Jasses were involved 
asses V-15, 


105 


106 


A-16, and B-17, who entered training in 
June and July of 1956, were grouped to- 
gether and are hereinafter referred to as 
Group A. The three classes who entered 
training in July and August immediately 
after Group A, Classes C-18, D-19, and 
E-20, are hereinafter referred to as Group 
B. 
Thus, four subsamples were formed out 
of the 857 graduates for analysis purposes: 
1. Small high school, Group A, N = 162; 
2. Small high school, Group B, N = 135; 
3. Large high school, Group A, N = 306; 
4, Large high school, Group B, N = 254. 


The High School Activities Inventory 


The High School Activities Inventory 
was devised to measure the extent and na- 
ture of each individual’s participation in 
his high school’s extracurricular (or co- 
curricular) activity program. Since extra- 
curricular programs vary considerably in 
different parts of the country and among 
different high schools in the same geo- 
graphical region, it was necessary to de- 
vise a questionnaire that could be inter- 
preted equally well by anyone attending 
high school anywhere in the United States 
or its territories. Fortunately, it was pos- 
sible to obtain the records of applicants 
to the Air Force Academy Class of 1959. 
Each applicant had written out a list of 
all his extracurricular activities. These lists 
were then consolidated into a master list 
of 226 separate activities. The activities 
ranged from polo to chess and many of 
them were listed by only one individual. 
By combining very specific activities into 
more general categories and by eliminating 
rare types of activities, it was possible to 
reduce the list to a more manageable 44 
activities which were included in the final 
inventory. Actually there were 70 items 
since the extent of one’s participation was 
often included as a separate item under 
one given activity. For example, partici- 
pating in football and winning a letter in 
football were two separate items. 

Each item takes the form of asking how 


J. D. KRUMBOLTZ, R. E. CHRISTAL, AND J. H. WARD, JR. 


many years an individual participated in 
a given activity or how many years he re- 
ceived a certain honor. The initial instruc- 
tions directed the examinees to consider 
only the last three years of their high 
school career in answering how many 
years they participated. Thus, there were 
only four possible answers to each item: 
0, 1, 2, and 3. In this way individuals who 
attended four-year high schools had no 
“time” advantage over individuals who 
attended three-year high schools. 

It was considered possible that some 
cadets might try to exaggerate the extent 
of their extracurricular participation al- 
though there was no reason why they 
should. To discourage this possibility each 
cadet was asked to list the names and 10- 
cations of the high schools he attended, 
the names of the high school principals, 
and the names of three teachers at e20, 
high school who could verify his activi 
ties. Although no attempt was made g 
follow up this information, it was believe 
that the possibility of verifying a cadet $ 
responses made it less likely that he wou 
exaggerate them. 

Responses and identifying data WE? 
recorded on mark sense cards C 
850259MS-O). A list of 20 of the 70 items 
may be found in Table 1 in summarize. 
form. For the most part each item is sel a 
explanatory if it is understood that €^ 
activity in Table 1 is preceded by “ EN 
many years did you participate in - -- his 
“How many years did you receive 
honor ....” 

To determine how well the inven’ ^ 
covered the wide variety of possible al 1 
tivities the first 200 cadets to answet 4 
inventory were asked to write on the E 
of the inventory any activities in w: a 
they had participated but which were n 5 
covered in the inventory. A total O' of 
comments were received including Uo of 
more comments from some cadets- t 
the comments were judged to be irrelev er 
for a variety of reasons, while B 
considered as legitimate criticisms © 


ntory 


Ts 


PREDICTING LEADERSHIP RATINGS 


Ee. 3 the inventory. The irrelevant 
Vea S concerned activities the ques- 
E m was not designed to cover such 
ae athletics, high school fra- 
ices scholastic awards, and church 
dded S. Other irrelevant comments in- 
wlith i specific honors or activities 
thse de not be classified, activities 
ventory i end been covered in the in- 
i ENT or indirectly, and ex- 
mies ittle or no participation. The 
the m a activity that was omitted by 
activities, ay concerned school service 
Bid ds ah our persons listed the visual 
Bibles a: ment, five were managers of 
school ee: seven served on various 
7.9 seagate and four listed other 
aneous school activities. 


Criterion 


peepee peer ratings were collected 

eir cce cadet sample twice during 

ight m ight training. Each man in a 
man in jn asked to rank-order each other 
ship b flight on the basis of his leader- 
Tst Cim The first time was after the 
ond time weeks of training, and the sec- 
training "wd after the first ten weeks of 

asis of on were rank-ordered on the 
flight, and e pooled rankings within each 
verted to 2i rank orders were then con- 
Tankings Scores. The stability of these 
cated by OEE AE six-week interval is indi- 
ranging porn for the six classes 
9f about l3 40 to .82 with a median 7 
Tite a a of studies (Hollander, 1957; 
Partment, a; 1957; United States De- 
demonstrat, the Army: 1949, 1952) have 
Peer "e im the reliability and validity 
9f future eni as an intermediate criterion 
Of these eme behavior. The results 

Scusseq other studies have been briefly 

Tumbolt elsewhere (Hollander, 1957; 
fro EN 19572). It may be concluded 
Tatings Se studies that leadership peer 
future m a substantial relationship to 
On-the.j adership ratings collected in an 

10b situation, 


107 


Statistical Methodology 


Two techniques were used to analyze 
the data. The first consisted of summing 
the weighted responses to each item (ie., 
adding the number of years recorded for 
each item) and computing the zero order 
correlation of this sum with the criterion. 
This was done separately for athletic 
items and for nonathletic items for each 
of the four samples. 

The second technique consisted of se- 
lecting and weighting items by an iterative 
multiple regression technique so as to 
produce the highest possible relationship 
with the criterion. The basic procedure 
that was used to obtain the regression ¢0- 
efficients has been described by Green- 
berger and Ward (1956). This technique 
starts with all regression coefficients equal 
to zero. At each iteration the unique cor- 
rection is computed for each regression 
coefficient that will maximally increase the 
multiple correlation. The alteration is 
made only on the one particular variable 
which has the largest correction. 

For each of the four samples it was first 
necessary to compute the column vector 
of validities. The iterative technique then 
proceeds to indicate the column vectors 
of intercorrelations that are successively 
required for the regression computation. 
As a result, if there are only k predietor 
variables used out of the 70 original po- 
tential predictors, then it is necessary to 
compute only the 70 by (& + 1) rectangu- 
lar matrix of correlation coefficients. 

The problem associated with “when to 
stop iterating” is as yet unsolved; how- 
ever, it has been observed that an F test 
for the significance of the increase of pre- 
dictive efficiency gives an approximate in- 
dication of a "good" place to stop. After 
the "best" set of b variables had been 
selected for each sample, the iterations 
were continued on the variables that were 
retained until a least squares solution was 
reached for the k predictors. The regres- 
sion equation computed for each of the 


108 


J. D. KRUMBOLTZ, R. E. CHRISTAL, AND J. H. WARD, JR. 


TABLE 1 
RELATIONSHIP OF HIGH SCHOOL Activity Irems TO THE LEADERSHIP CRITERION 


Correlation with Criterion in 
Mean No. 
- Small HS Large HS of Years 
Item No.| Activity per Me 
A B A ss 
(N = 162)(N = 135)(N = 306)|(N = 254) 
1 | Football .18* 18 14* 19 80 
3 | Basketball .04 .29* 13 25* 56 
7 | Track 18 32 07 —.02 46 
13 | Tennis .05 —.08 -l1* .00 1 
15 | Swimming .04 .20* .02 06 11 
19 | Other sport —.16* | —.02 .03 —.01 24 
23 Newspaper assistant —,15* —.04* —.18 14 09 
editor 
25 | Newspaper photog- —.07 —.07 —.17* .02 04 
rapher 
34 | Debate team .14* 18* al —.08 10 
36 Chorus or glee club .07 05 —.01 .18* 52 
41 Science club member — .08 — .23* — .08 04 31 
44 | Language club member .00 16* .00 12* 22 
47 Hobby or interest club —.15 et —.04 —.09* 49 
member 
57 Student government —.06 11 41 20* 06 
president 
58 Student government: —.12* —.07 .08 14 14 
other officer 
59 Student government: —.03 09* .15 11 34 
member 
63 President of class —.02 .06 16 -23* 14 
64 Vice-president of class .09 .20 18* .14 12 
65 Secretary of class .03 2:19 .15* —.06 .07 
69 Outstanding student .18* —.01 —.04 .00 3 
award 


* Items weighted into multiple regression equations. 


four samples was then applied to all sam- 
ples for cross validation, 


RESULTS AND Discussion 


The correlations of each of 20 items 
with the criterion for each of the four sam- 
ples are reported in Table 12 Asterisks 


? A four-page table for all 70 items and 
including the extent of participation by 
eliminees as well as graduates has been de- 
posited with the American Documentation 
Institute, Order Document No. 5955 from 
ADI Auxiliary Publications Project, Photo- 
duplication Service, Library of Congress, 
Washington 25, D. C. remitting in advance 
$1.25 for microfilm or $125 for photocopies. 
Make checks payable to Chief, Photodupli- 
cation Service, Library of Congress. 


beside a correlation indicate that the im 
was weighted into the multiple regressi, 
equation for that sample. The 20 m 
appearing in Table 1 were selected fro” 
among those judged to be of most inte? p 
and yielding the most substantial relatio? 
ships with the criterion. ly 
Positive correlations were consisten! 
obtained for most of the major sports 2? 5 
for athletie honors. In small high sehr 
being assistant editor on the school new” 
paper received negative weight, while, 
ing a member of the debate team receiV? 
positive weight. In large high schools n 
president or vice-president of the de 
was consistently and positively relate 


PREDICTING LEADERSHIP RATINGS 


109 


pO TABLE 2 
DER CORRELATIONS OF ACTIVITY ScoRES WITH THE LEADERSHIP CRITERION 
Sample N Correlations* pom m ox 
a Tac Tne Tan Tte Mean| SD | Mean SD 
EZ 
2. SB 162 | .10 —.05 .24** | .02 6.3 | 6.3 | 10.8 
3 LA AE AN SE as | 57 | 52. 92 an 
4. LB .18** .06 ore | 29 3.2 | 3.8 | 7.6 | 6.5 
254 | ‘ogee | iige] 20 | .24t* | 3.6 | 4.2 6.7 | 5.5 


Activit, 
y Score; and c, the leadership criterion. 


** Signifi 
cant at the .01 level. 


high 
iuc MOD. situm while such con- 
e small hi a hips were not found for 
member or igh school samples. Being a 
Was positiv Pa in student government 
arge high sY related to the criterion in 
Schools, schools but not in small high 
Many ; 
expect E Lt that one might reasonably 
Ship ratin e positively related to leader- 
Soever, Fo, s showed no relationship what- 
Were ida ue all of the following 
legate to m to the criterion: being à 
B arae oys’ State; being president 
terest oly interest club, a hobby or in- 
club; ang b à language club, or a science 
Paper or ie editor of the school news- 
are a Gantt, earbook. The reasons for this 
Viduals er for conjecture. Perhaps indi- 
Ability Ah ons relatively low leadership 
dent o. dw such activities. Being presi- 
ould t ge in a group of nonleaders 
“adership 2 give no indication of one's 
à mou Poe larger more repre- 
hà irt erhaps leadership is spe- 
ta Y summing each i 
9 items is. each individual's responses 
gi T Was co and 70, and Athletic Activity 
vellaly omputed for each individual. 
js ib X of each cadet's 
the hletio: Activit 21-69 provided his 
i WO scores 3 Score. The sum of 
tdg Table 2 ». as the Total Activity 
ang 10ng o ean ports the zero order cor- 
the corre] h score with the criterion 
ations of the Athletic and 


. lative 
“fie to t 


SPonseg 


n 
Subseri 
ti " 
pts represent the following variables: a, Athletic Activity Score; n, 


Non-athletic Activity Score; t, Total 


Nonathletic Activity Scores for each sam- 
ple. The correlations of the Athletic Ac- 
tivity Score with the criterion ranged 
from .10 to .29 and were consistently 
higher than the correlations of the Non- 
athletic Activity Score with the criterion 
which ranged from —05 to 18. 

The multiple regression equations de- 
rived from each sample yielded spuriously 
high multiple Rs ranging from .36 to 62. 
When the regression equations derived 
from each sample were cross-validated on 
each other sample, the results shown in 
Table 3 were obtained. It is of special 
interest to note the cross-validities of the 


small high school samples on each other 


and the cross-validities of the large high 
school samples on each other. Multiple 
Rs shrank from 47 to 17 and from .62 to 
19 in the small high school samples. In 
the large high school samples the multiple 
Rs shrank from .36 to 19 and from .42 to 


14. 


TABLE 3 


VALIDITIES OF REGRESSION E 
Eacu SAMPLE 


QUATIONS IN 


Weights Validated on | Criterion 
Derived from N 

Sepe ilz|$]|* Mean | SD 

1. SA 162 .19|.11|.03 100.7/16.1 

2. SB 135 |.17 .19].12| 98.7) 15.8 

3. LA 306 |.16,.21 .14]100.1 14.5 

4. LB 254 |.11|.29 19 100.1/15.5 


110 


Summary AND CONCLUSIONS 


The purpose of this study was to de- 
termine whether a self-report question- 
naire about high school activity participa- 
tion can be used to predict later leadership 
peer ratings. An inventory on high school 
activity participation was administered 
to a total of 956 aviation cadets undergo- 
ing preflight training at Lackland Air 
Force Base. The criterion consisted of the 
summation of T-scored leadership peer 
ratings collected after the fourth and after 
the tenth week of training. The data were 
analyzed by computing zero order and 
multiple correlations between the predic- 
tor items and the criterion. 

The analysis revealed consistently low 
but positive correlations between inven- 
tory scores and the criterion. The multiple 
Tegression equations derived from each 
subsample yielded multiple Rs which 
shrank upon cross-validation on compara- 
ble samples to values which ranged from 
-14 to .19 with a median of .18. Three of 
the four zero order correlations between 
the Athletic Activity Index and the cri- 
terion were significantly different from 
zero at the .01 level. Only one of the four 
correlations between the Nonathletic Ac- 
tivity Index and the criterion reached sig- 
nificance. 

On the basis of these results the follow- 
ing conclusions were reached: 

1. A self-report inventory on extracur- 
ricular participation succeeded in predict- 
ing leadership ratings of aviation cadets 
better than chance, but considerable error 
stil] remained in such predictions. The low 
positive correlations do not, of course, nec- 
essarily mean that training received in 
high school activities produced increased 
leadership ability since those choosing to 
take part in high school activities might 
very well have been above average in 

leadership ability originally. e 

2. Although certain types of activities 

tended to have differential predictive 
power for people from large and small 


J. D. KRUMBOLTZ, R. E. CHRISTAL, AND J. H. WARD, JR. 


high schools, in general activities in large 
and small high schools were about equally 
predictive of the criterion. 

3. Athletic participation and honors 
were more predictive of future leadership 
than nonathletie participation and honors. 

4. The iterative multiple regression 
technique did not produce a more v: 
composite than the simpler zero order 
correlation technique in this study. 


REFERENCES 


GnreNpERGER, M. H, & Waro, J. H» Es 
An iterative technique for multiple coa 
relation analysis. JBM Applied Scient 
Division Technical Newsletter, 1956, 
No. 12, 85-97. à p 

HoruNpER, E. P. The reliability of Pl 
nominations under various conditionne 
administration. J. appl. Psychol, 199: 
41, 85-90. i d 

Knuwsorzz, J. D. Physical proficiency Pol 
predictor of leadership. USAF Perse s 
Train. Res. Cent. Res. Rep., 1957, 91) 
57-60. (ASTIA Document No. 126391. 
(a) I trae 

Krumsourz, J. D. The relation of SP RB 
curricular participation to leader 
criteria. Personal guid. J., 1957, 35, 902.) 
314. (ASTIA Document No. 134 
(b) b 

Krumpourz, J. D, & Canierar, R. B E 
dictive validities for first-year cri Por- 
at the Air Force Academy. USAF 1957 
sonnel Train. Res. Cent. Res. Reps No. 
No. 57-95. (ASTIA Document 
134218.) is 

Trites, D. K., & Sets, S. B. Combat don: 
formance: Measurement and predic 
J. appl. Psychol, 1957, 41, 121-130. po) 

U. S. Department or tHe Army, Perso tiot 
Research Section, Follow-up valide t 
of predictor instruments for West ins 
Classes of 1944, 1945, and 1946 Eom. 
1948 ratings on DA AGO Form tech: 
USA TAGO Personnel Res. Br- 
Res. Rep., 1949, No. 811. poel 

U. S. DEPARTMENT or rae Army, Perso er’ 
Research Section. Studies of the he 
formance of officers in combat: ato 
lationship of West Point measures 0 
later combat effectiveness. US. 1952 
Personnel Res. Br. tech. Res. Rep» 
No. 969. 


Received July 21, 1968. 


pe—MMÓ—MM—————ÀÁÀÁs 


JOURNAL or E, 
EDUCATIONAL Psy 
Vol. 50, No. 3, 1959 PsycHoLoGY | 


EFF. 
ECTS OF SOCIAL RECOGNITION UPON THE EDUCATIONAL 
MOTIVATION OF TALENTED YOUTH 


DONALD L. THISTLETHWAITE* 
National Merit Scholarship Corporation 


P mid problem in American ed- 
Stier, made more imperative by recent 
lating eh tie Sputnik—is that of stimu- 
Mona : students to realize their educa- 
iota ential. What can be done to 
fidis Bay an youth to seek further 
"pum Noise d Because careers in 
well Aden education do not usually pay 
vor d mm students are frequently di- 
listos. o careers which, though more 
abilities.” e incommensurate with their 
ment E wards for scholastic achieve- 
uw other forms of social recognition 
stu go apc been used to stimulate the 
B sens motivation for higher education, 
of social rolled studies of the effectiveness 
gen Fan Tecognition as an incentive have 
Previ 3 A 
giving a studies have suggested that 
Standing idents social recognition for out- 
tends to Performances on aptitude tests 
tend A Ui y their motivation to at- 
Scholarshi- and their success in obtaining 
timates i Thistlethwaite (1958) e 
Tecognitio 85, of the near winners receiving 
tion, on] n in the second Merit competi- 
College about 3% did not enroll in 
half of qun rate which is less than 
(Phearm, e most conservative estimate 
tion of the 1949) made before the initia- 
Was a Scholarship program. It 
Topout ^ m that this extremely low 
Public reg e was a consequence of the 
Teceived En which the near winners 
and Stal n the Merit program. Holland 
thirds ci se (1957) report that two- 
the finalists not receiving Merit 


Cholarshi 
age 
: hips in the first Merit competi- 


Mei 
a er lee is indebted to Laura Kent 
us thi papa ul editorial comments on drafts 
Hale by in This study was partially sup- 

d the Ol d p ational Science Foundation 
ominion Foundation. 


111 


tion managed to obtain scholarships from 
other sources and that half of these stu- 
dents attributed their success in part to 
the recognition they had received. 

To obtain a more reliable and compre- 
hensive estimate of the effects of recogni- 
tion, the present ex post facto study com- 
pares two groups of talented students 
receiving different amounts of public rec- 
ognition. The two groups differ, then, in 
the amount of social support they received 
for academic achievement; it is also pos- 
sible that they differ in their self-evalua- 
tions, as a result of learning how they 
rank relative to other talented students. 
Since both groups received some degree of 
recognition, these results tend to give a 
conservative estimate of the effects. A 
comparison of more disparate groups 
would probably reveal larger differences. 
But even though the magnitude of effects 
is probably underestimated, some of the 
student behaviors influenced by public 
recognition are revealed by the present 


study. 
METHOD 


The first group (Group A) was selected 
from the 5126 students (80% of the Cer- 
tificate of Merit winners in the 1957 pro- 
gram) who replied to a survey sent to all 
C of M winners. Each member of Group 
A had received a C of M attesting to his 
"high potential for college achievement" 
as demonstrated by a “distinguished per- 
formance on the nationwide selection tests 
for Merit Scholarships" and had had his 
name published in a booklet which was 
distributed to all accredited junior and 
senior colleges, to universities, and to à 
large number of other scholarship-grant- 
ing agencies. In addition, these students 
were frequently publieized by press serv- 


112 DONALD L. THISTLETHWAITE 


ices and acclaimed at high school assem- 
blies. The second group (Group B) was 
drawn from 2848 of the second-highest- 
scoring 7500 students, who had received 
a letter of commendation and somewhat 
less recognition in the press. The pool 
from which Group B was drawn repre- 
sents the 80% replying to a survey sent 
to a 47% random sample of commended 
students. No attempt was made to an- 
nounce the names of the commended stu- 
dents to colleges and universities or to 
other scholarship donors. A count of press 
clippings shows that news items totalled 
approximately 2600 for C of M winners 
and 1100 for commended students—a ra- 
tio of approximately two and one-half to 
one” 


Matching Variables 


So that the two groups would be com- 
parable, C of M winners were matched 
with commended students on the follow- 
ing variables: (a) sex, (b) verbal aptitude 
score (the College Entrance Examination 
Board Scholarship Qualifying Test), (c) 
geographical region, and (d) father’s oc- 
cupational class. The majority of the re- 
sponses to the question about father’s oc- 
cupation were assigned to classes on the 
basis of fixed-alternative choices; the re- 
maining “free” responses were coded using 
the Minnesota Scale of Paternal Oceupa- 
tions (1955). From Holland’s comparison 
of father’s occupational class (as defined 
by the Minnesota Seale) with family in- 
come as reported by 800 finalists in the 
1956 Merit program (1958), it is esti- 
mated that the tetrachorie correlation be- 
tween these two variables is .64; thus, 
father’s occupational class is a crude in- 
dex of the family resources available for 
sending the student to college. In addi- 


2 Commended students received their let- 
ters of commendation four to six weeks be- 
fore the C of M winners received their 
certificates. Thus the commended students 
had the advantage of prior recognition in 
applying to other donors for scholarship aid. 


tion, a rough matching was made on 
mathematical aptitude (SQT-M). The use 
of these matching variables resulted in 
1302 matched pairs, each pair made up 
of a C of M winner (Group A) and a 
commended student (Group B). There 
were no significant differences between the 


two groups with respect to SQT-V, geo- > 


graphical origin, or father’s occupation. 
Sex distributions were identical. However; 
on the mathematical subscale of the SQT, 
Group A has a mean which was 1.7 units 
greater than the mean of Group B—a dif- 
ference small in magnitude but neverthe- 
less statistically significant (p < 01). 
Since both members of a pair were iden- 
tical with respect to the four primary cOn- 
trol variables but differed in the amount 
of publie recognition they had received, it 
was possible to study the effects that such 
recognition has on students. 


Measurement of Attitudes 


To provide a measure of the impact oi 
recognition upon attitudes toward intel- 
lectualism, the following eight items wel? 
included in the surveys (each response 1 
the keyed direction was weighted one): 


bout 


1. College teachers complain a lot e A 


their pay, but it seems to me they 8e 
much as they deserve. Disagree. lars 

2. The work of theoreticians and scho s. 
should be subsidized even though it ™ 
lack practical value. Agree. their 

3. Science and philosophy have, or- 
place, but there are probably many exi 
tant things that can never be understoo 
the human mind. Disagree. 

4. I spend more of my free time re 
than I do watching television. Yes. m- 

5. Most intellectuals are extremely Ber 
petent persons and would excel in alm 
any job they undertake. Agree. 

6. I am annoyed by writers who £O 
of their way to use strange and unu 
words. No. r ore 

7. Generally the man of ideas i$ Pon 
important to society than the man of ac! 
Agree. n 

8. I would rather write a fine book tha 
be an important public figure. Agree. 


ding 


out 
sual 


SOCIAL RECOGNITION AND EDUCATIONAL MOTIVATION 


ResuLTS 


Attitudes Toward Intellectualism 


P of the aims of the nationwide talent 
ticus] read conducted by the Na- 
toal x. Scholarship Corporation is 
uud dh — to intellectual attainment 
wine diee Y, to stimulate promising 
ers cholars and scientists to seek fur- 
Ses se training. It is reasonable 
Bond € to regard differences in attitude 
Ol cena: or scholarly careers as 
[5 eee of the effectiveness of the 
Group Fr of Merit, awards. Students in 
iri Gare (o to have higher intellectual- 
01). eme than those in Group B (p € 
significant her, recognition has statistically 
Elis à ne upon the attitudes of 
ployed doe whose fathers are not em- 
cupations professional or managerial oc- 
high see I sien with relatively 
Whose f E aptitudes. Boys and students 
profess hers are in professional or semi- 
ihesi nal occupations tend to be in- 
these an the same manner, although 
fieisl Monced are slightly smaller. The 
these A dime is, of course, whether 
y ids ges in attitudes are paralleled 
ar changes in career plans. 


Career Plans 


of nd umination of the vocational plans 
Tates the eed Groups A and B corrobo- 
Ceding an a suggested by the pre- 
not only Phe Public recognition tends 
of students change the “verbal” attitudes 
Teer plan S—but also to change their ca- 
told to « n Students in each group were 
to enter check the vocation you now plan 
Rreater as your life work." A significantly 
Tou ae apanion of the students in 
College + report they are planning to be 
Table so or scientific researchers. 
effect y Shows that this effect, like the 
alism, i attitudes toward intellectu- 
ong Pi id primarily among girls and 
ents whose fathers are not em- 


Ployeq A 
in : 
CUpationg, professional or managerial oc- 


113 


TABLE 1 


Errects or SocraL RECOGNITION UPON 
ATTITUDES TOWARD INTELLECTUALISM 


Percentage 
No. | of AB pairs 
Classification? or | teeth] uu 
purs intellectual- 
ism score 
High SQT-V 511 55.6 «.02 
Low SQT-V ss] 52.2. |— 
Boys 643 53.8 |<.06 
Girls — 336 55.7 |<.04 
Professional or semi-| 644 53.7  |«.06 
professional father 
Nonprofessional 335 55.8 |<.03 
father 
Total cies «s 979 54.4 |<.01 


a The mean SQT-V score for “high aptitude” mem- 
bersof Group A was 48.3; for matched Group B students, 
48.2. Corresponding mean scores for "low aptitude" A 
and B students were 41.3 and 40,8, respectively. The term 
? (which designated AB pairs in which the 
B member's score fell below the Group B median) is not 
descriptive inan absolute sense, since those students had 
a mean score above the 96th percentile relative to esti- 
mated national norms for public high school seniors. 

b Two-tailed test based upon tests of the significance 
of correlated proportions have been used in this, and in 
succeeding, tables to estimate probability values. 


“low aptitude 


Despite the similarity of the results in 
Tables 1 and 2, it appears that items call- 
ing for an expression of vocational plans 
measure different processes ihan items 
which call for the expression of more dif- 
fuse beliefs and preferences. For example, 
in attitudes occur primarily 
udents with exceptionally high 
verbal aptitudes (Table 1), whereas 
changes in vocational plans occur prima- 
rily among students with only moderately 
high verbal aptitudes (Table 2). The for- 
mer, it may be assumed, are influenced by 
recognition to place a greater value upon 
intelleetual achievement in the arts, hu- 
manities, and social sciences, but are not 
attracted to careers in scientific research. 
Students who have only moderately high 
verbal aptitudes, on the other hand, are 
influenced by recognition to 80 into sci- 
ence careers, even though their attitudes 
toward scholarly pursuits do not change. 
In short, it is hypothesized that the in- 


changes 
among st 


114 DONALD L. THISTLETHWAITE 
TABLE 2 
EFFECTS or Socran RECOGNITION Upon PLANS TO BE A COLLEGE 
TEACHER oR SCIENTIFIC RESEARCHER 
Percentage planning to 
be a college teacher or 
Classification No. of pairss| scientific researcher _| Percent, diff, $? 
Group A | Group B 
High SQT-V 672 24.2 21.7 2.5 = 
Low SQT-V 620 23.5 17.9 5.6 «.02 
Boys 850 24.9 22.2 2.7 ap 
Girls 442 21.9 15.4 6.5 «.02 
Professional or semipro- 827 23.4 20.4 3.0 — 
fessional father 
Nonprofessional father 465 24.7 18.7 6.0 «.03 
Motaln euesmewee.| 1200 24.0 19.9 4.1 «.02 


^ The number of pairs vary among the tables be: 
P Two-tailed tests of the significance of differen: 
succeeding, tables to estimate probability values, 


cause of nonresponse to survey items. " in 
ces between correlated proportions have been used in this, and 


TABLE 3 
ErrEcrs or RECOGNITION Upon PLANS TO SEEK PHD on MD DEGREE 


Percentage planning 
to get 
Classification No. of pairs | the PhD or MD degree |percent, diff. $ 
ae a 
Group A | Group B : 
IM 
High SQT-V 662 37.3 34.0 3.3 Ta 
Low SQT-V 620 38.7 32.8 5.9 «.05 
Boys 845 49.1 45.1 4.0 s 
Girls - i 437 16.5 10.5 6.0 <.02 
Professional or semipro- 821 40.8 35.8 5.0 «.03 
fessional father f 
Nonprofessional father 461 33.0 29.1 3.9 urs 
nM M ENET 38.0 33.4 4.6 «.01 


tellectualism scale is primarily sensitive to 
an intellectual orientation toward the arts, 
humanities, and social sciences; whereas 
the analysis presented in Table 2 prima- 
rily reflects differences in orientation to- 
ward a scientific career. 


Motivation to Seek Advanced Training 


Similar trends were observed when the 
educational plans of Groups A and B were 
compared. A significantly greater propor- 
tion of Group A reported plans to seek 
the PhD or MD degree (Table 3). Slightly 
greater—though not significantly greater 
—effects were observed among girls, stu- 


dents with only moderately high verbal 
aptitudes, and among students whose 2 
thers are in professional or semiprofes" 
sional occupations. 


Success in Obtaining Scholarships Fro™ 
Other Sources 


Public recognition increases significant e 
the likelihood that the student will receiv? 
scholarship aid. 60% of the students ?? 
Group A and 55% of the students ?? 
Group B hold freshman scholarships 
< 02). Table 4 shows that the effects Be 
recognition upon success in obtaining E 
sistance occur primarily among boys, ^ 


SOCIAL RECOGNITION AND EDUCATIONAL MOTIVATION 


115 


oe TABLE 4 
S or RECOGNITION Upon Success IN OBTAINING SCHOLARSHIP ASSISTANCE 
eran tae of 
" g students 
Classification No. of pairs holding scholarships — |Percent. diff. ? 
a e — Group A | Group B 
igh SQT- 
qd 616 59.7 57.1 2.6 = 
oys 582 60.3 52.9 7.4 «.01 
Girls 791 61.3 56.3 5.0 «.05 
Professional duas 407 57.5 52.8 4.7 = 
fessional athe ipro- 791 53.2 47.8 5.4 «.04 
a father 407 73.2 69.3 3.9 — 
OVAL: aiarar rein 1198 60.0 55.1 4.9 <.02 


den H 
Pig only moderately high verbal 
employed and students whose fathers are 
sional occ in professional or semiprofes- 
Te sss PRESA 
applying F ready been pointed out that in 
ros B s Pp donors for scholarships, 
ognition "s the advantage of prior rec- 
groups tea seems likely that, if the two 
o oent received their different degrees 
av Em simultaneously, there would 
em with 6 greater difference between 
nin espect to their success in win- 
8 scholarships, 


Pro " 
Portions Enrolling in College 


Bec, 
college. eae all of these students are 
recognition the hypothesized effects of 
College Ae ce the decision to go to 

Pa ded not be adequately tested. 
Tolled in ately 96% of each group en- 
Toll, the college. Of those who did not en- 
Plans on} Majority deferred their college 
military Y temporarily, usually because of 
Sources iue or lack of financial re- 
DO appre ‘Ccause of the extreme marginals, 
and Moe differences were expected— 
and ps. Were found—between Groups A 


i in th 5 : 
immediately. proportions entering college 


We Discussion 

a 
Ae outstandi effects of social recognition 
ests arg; a) performances on aptitude 
to make more favorable 


the recipient’s attitudes toward intellec- 
tualism, (b) to motivate him to enter a 
college teaching or & scientific research 
career, (c) to stimulate him to seek ad- 
vanced degrees, and (d) to increase the 
likelihood that he will obtain scholarship 
assistance in college. It might be expected 
that public recognition has the greatest 
effect upon the *]ate-bloomer"—the highly 
talented student who has not had a par- 
ticularly distinguished record of achieve- 
ment in high school. Such students are, of 
course, the very ones à talent search seeks 
to discover and influence; those students 
whose talents have been largely unrecog- 
nized by their peers and teachers should 
be more influenced by publie recognition 
than the valedictorians or salutatorians of 
high school classes. To check this hypoth- 
esis, additional comparisons were made of 
the AB pairs after they had been sorted 
by the high school rank of the C of M 
winner (unfortunately high school ranks 
were not available on members of Group 
B, so that these comparisons are less 
desired). As expected, rec- 
the greatest increments 
toward intellectualism 


among the students who had high ver- 
bal aptitudes (SQT-V scores above the 
group's median) but relatively undistin- 
guished records of high school achieve- 
ment (percentile ranks below the group's 
median). On the other hand, recognition 


rigorous than 
ognition produced 
in favorableness 


116 


did not have an exceptionally great in- 
fluence upon the “late-bloomer’s” career 
plans or upon plans to seek the PhD or 
MD degrees. It may be that such stu- 
dents, being unaccustomed to evaluating 
themselves as persons of outstanding tal- 
ent, need time to rethink their vocational 
plans. It is possible that marked "sleeper 
effects" will be found among these stu- 
dents—that is, the effects of publie rec- 
ognition upon the “late-bloomer’s” career 
plans may not be fully manifest on an 
immediate after-test, but only after a year 
or more in college? 

In interpreting these results, one point 
should be kept in mind: both Group A 
and Group B received considerable public 
recognition. Although the two groups dif- 
fered in type and amount of recognition 
received, differences were not as marked 
as those which would be found if “rec- 
ognized” and “not recognized" groups 
were compared. Consequently, the mag- 
nitude of effects is probably less than that 
which would be observed in the ideal ex- 
perimental comparison. The two groups 
are not perfectly matched, it is true, in 
mathematical aptitude, but the difference 
between them is very small. There is also 
the possibility—present in all ex post 
facto designs—that the observed effects 
are influenced by extraneous factors. On 
the other hand, the literature suggests 
that the matching control variables used 
account for most of the predictable vari- 
ance in college-going behaviors (Berdie, 
1954; Educational Testing Service, 1957; 
Hollingshead, 1952). On the bases of our 
present knowledge, therefore, we may con- 
clude that all obvious extraneous variables 
have been reasonably well controlled, and 
that the results probably err in that they 
underestimate the effects of publie rec- 
ognition. 

SuMMARY 


An ex post facto comparison was made 
of the educational motivation of two 


3 The present results are based upon sur- 
veys made approximately six months after 
graduation from high school. 


DONALD L. THISTLETHWAITE 


groups of talented students receiving dif- 
ferent amounts of social recognition for 
their performances on college aptitude 
tests in a nationwide scholarship competi- 
tion. Increased recognition was observed 
to increase the favorableness of attitudes 
toward intellectualism, the number of stu- 
dents planning to seek the PhD or MD 
degree, and the number planning to be- 
come college teachers or scientific re- 
searchers. The latter effect was observed 
particularly among students whose fa- 
thers are employed in nonprofessional 0c- 
cupations and among students who have 
only moderately high verbal aptitudes. In 
addition, inereased recognition was ob- 
served to inerease the likelihood that the 
students will obtain a scholarship from 
his college or from some other scholarship 
granting agencies. 


REFERENCES 


Beroe, R. F. After high school—W hat? 
Minneapolis: Univer. Minnesota Press, 
1954. d 

Epucationa, TrsrING Service. Backgrout 
factors relating to college plans ae 
college enrollment among public hig 
school students. Princeton: Author, 199% 

Houtanp, J. L, & Srauwaxer, J. M. n. 
honorary scholastic award. J. highe 
Educ., 1957, 28, 361-368. 

Houtanp J. L. A note on the reliabilit: "d 
validity of the Minnesota Scale for P% 
ternal Occupations as an estimate e 
family economie status. J. appl. Psy 
chol., 1958, 42, 195-196. 

Houurncsurap, B. S. Who should go to col 
lege. New York: Columbia Unive™ 
Press, 1952. 

PuearMan, L. I. Comparisons of high 
graduates who go to college with th A 
who do not. J. educ. Psychol, 1949, ^^" 
405-414. ! 

THISTLETHWAITE, D. L. The conservati 


-school 
hose 


on of 


8, 

intellectual resources. Science, 195 
128, 822-826. of 
UNIVERSITY OF MINNESOTA, INSTTUTA, ale 


Cump Werrare. The Minnesota s 
for Paternal Occupations. MinneaP 
Minn.: Author, 1955. 


Jis: 


Received October 22, 1958. 


y and - 


SS eS ere 


JOURNAL o 
: F EDUCATIONAL P. 
Vol. 50, No. 3, 1959 SYCHOLOGY 


I 
R0 AMONG VARIOUS INTELLIGENCE 
HIEVEMENT, AND SOCIAL CLASS SCORES 


LOTUS M 
Wartburg 


. KNIEF 
College 


AND JAMES B. STROUD 


State University of Iowa 


vus eee sae was planned, first, to 
! o wale eme data on the social class 
Di ice las issue in intelligence testing 
àmong ne to ascertain interrelationships 
tests and « m relatively new intelligence 
the mes olastie achievement. Scores on 
lated: (uj 1. measures were intercorre- 
Tests, Verb faire Intelligence 
elligance Pul (b) Lorge-Thorndike In- 
oth insta ests, Nonverbal (Level 3 in 
lige E (c) Davis-Eells Games 
gressive T Leve); (d) Raven's Pro- 
atrices; (e) Iowa Tests of Basic 


Skills; 
; and (f) the W: : 
Yaraeteristios, Warner Index of Status 


PnocEDURE 


Al 
hea tests except the Progressive 
^4 fourth ere administered to a sample of 
City of ab grade pupils in a Midwestern 
The basic a 80,000 general population. 
Hoes ae consisted of all the pupils 
Vere admi © present at the time the tests 
tary Tem pes in 6 of the 18 elemen- 
Upon the edi 18 schools were ranked 
the Towa, m of mean scores achieved on 
tein aq administered 
Tanking Ee testing program. From this 
and incor very third school was selected 
Bressive M. rated in the sample. The Pro- 
the followi atrices test was administered 
Stade, in ing year to the pupils, now fifth 
the six every other fourth grade class in 
cedure a used originally. This pro- 
th TO jelded only 164 pupils tested by 
aken ac Matrices who had also 
ously is other tests the year previ- 
a Minister, owa Tests of Basic Skills were 
ells test ed in January 1957. The Davis- 
and the verbal and nonverbal 
117 


tests of the Lorge-Thorndike battery were 
administered two or three weeks later; the 
Progressive Matrices, in the spring of 1958. 

A modified Warner scale based upon oc- 
cupation, house type, and dwelling area, 
weighted 5, 4, and 3, respectively, was 
used to assign social class ratings. Fathers' 
occupations were obtained from the school 
records. In 54 per cent of the cases the 
employers were interviewed in order to ob- 
tain a more precise job description. In as- 
signing house-type and dwelling-area val- 
ues the writers were assisted greatly by 
the City Assessor of the city involved. Pu- 
pils in the sample were rated on a 7-point 
scale on each of the three categories used, 
the assigned ratings Were multiplied by 
the weights indicated above, and the prod- 
ucts summed to provide a total ISC score 
value for each pupil in the sample. Accord- 
ing to Warner estimates the following "s 
would be expected for a sample of the size 
here used: 11, 34, 96, 117, and 86, for UU 
and LU (combined), UM, LM, UL, and 
LL classes, respectively. The obtained val- 
ues were 11, 34, 100, 117, and 82. 

In order to describe the sample further 
and to supply data on the behavior of the 
tests used, means and standard deviations 


are here given: 


VARIABLE MEAN SD 
Warner ISC 50.9 14.8 
Davis-Eells IQ 104.5 15.4 
L-T Verbal IQ 106.5 13.9 
L-T Nonverbal IQ 108.8 13.7 

45.1 10.4 


ITBS Composite" 
c Skills scores are 
45.1 should be read 
th grade. 


1Jowa Tests of Basi 
grade equivalent scores. 
as 5.1 months into the four 


118 


RESULTS 


The zero order correlations among ISC, 
the intelligence scores, and ITBS Compos- 
ite are shown in Table 1. As noted, the 
Progressive Matrices test was not adminis- 
tered at the same time as the other tests 
nor was it administered to all of the origi- 
nal sample. As supplemental data of some 
interest, correlations between raw scores 
on this test and scores on the other tests 
employed in this investigation were ob- 
tained and are reported in Table 1. 

It may be of some interest to report the 
correlations between each of the intelli- 
gence scores and the various ITBS subtest 
scores. This is done in Table 2. 


TABLE 1 
INTERCORRELATIONS AMONG ISC, INTEL- 
LIGENCE MEASURES, AND ITBS 
COMPOSITE SCORES 
(N = 344) 


Variable $292. 1.3 ub |S || 


1. ISC .309|.304|.323|.179|.340 
2. D-E IQ .571|.645|.405|.508 
3. L-T V IQ .709|.437|.839 
4. L-T NV IQ .521|.683 
5. Prog. .450 
Matrices* 
6. ITBS 
Composite 


5 Test 5, N = 164. 


TABLE 2 
CORRELATIONS BETWEEN ISC, THE INTEL- 


IGENCE MEASURES, AND VARIOUS 
ITBS Susrests 


(N = 344) 
ITBS Tests . 

FHPRESIEXE: 

EIERFHIHIE 

& |$4/2°| 4 | 3 
ISC .255|.323|.290|.365.288 
DE IQ .516|.561|.532|.446|.515 
L-T V IQ .716,.790|.752|.726|.739 
L-T NV IQ . 580} .608} .679| .576).663 
Prog. Matrices* .959|.438|.424|.374|.449 


a N = 104. 


LOTUS M. KNIEF AND JAMES B. STROUD 


TABLE 3 
PARTIAL CORRELATION COEFFICIENTS 
AMoNG ISC, INTELLIGENCE, AND 
ITBS Composite SCORES 


(N = 344) 
i Š Partial Zero Order 

Variables Coefficient r's 

51.4 .518* 51 = .568 

52.4 .821* 52 — .839 

53.4 .644* 53 = .683 

54.1 .211* 54 = .340 

54.2 .163* 340 

54.3 sira" .340 
^ 5 = ITBS Composite d 

1 = D-E IQ 

22L-TVIQ 

3 = L-T NV IQ 

4 = ISC 


* Significant at .01. 


Further analysis of the data was at- 
tempted by means of partial and multiple 
correlation procedures. Correlations Were 
computed (a) between scores on each © 
the original three intelligence tests and 
ITBS Composite, with ISC scores pat 
tialled out, and (b) between ISC and ITBS 
Composite scores, with scores on each 0 
the original three intelligence tests paf 
tialled out in turn. The coefficients aT? 
shown in Table 3, together with zero order 
correlations between the first two vat” 
ables. 0 

In the multiple correlation analysis Rs 
were computed between ITBS scores 4? 
various combinations of intelligence score? 
taking two tests at a time. These are PTC” 
sented in Table 4, together with the zero 
order correlations between ITBS and the 
other tests involved, for the reader's con 
venience. 

As a further step multiple regre 
analysis was performed with ITBS CoU 
posite scores as the dependent variable 2? 
ISC and the original three intelligence 
scores as independent variables. P? 
weights were determined and tested for 
significance, regression equations deve" 
oped, and multiple Rs computed. The d 


sults are reported in Table 5. The value? 


ssion 


LEE ENG, ACHIEVEMENT, AND SOCIAL CLASS 


ae vm column represent the E 
ployed in th i ode id 
B at column. 
T std of a brief discussion, attention 
sults obt o some of the more pertinent re- 
used ie rm All of the intelligence tests 
related si s ITPS Composite scores cor- 
(ISC s significantly with social status 
fhe doce sini and, with the exception of 
the e gam Matrices, to approximately 
vatios extent. The partialling out of the 
Iove: de scores resulted in 
Dy Gan e e significant, correlations be- 
procedure € ITBS. The partialling out 
between ee to lower the correlations 
and L-T S and both the Davis-Eells 
P m UM scores more than that 
this tend à verbal and ITBS. However, 
hen mei is more apparent than real. 
as : 5 involved are transformed into 
and MA e relationships between ITBS 
(verbal a wo kinds of intelligence scores 
duced and nonverbal) appear to be re- 
Wh about equally. 
Becr. s 5 multiple correlation, ISC 
telligence combined with the various in- 
ency for A te there was a slight tend- 
etween ns procedure to increase the Rs 
Scores and ee and L-T nonverbal 
lat BS more than the R between 
When z ue and ITBS. However, again, 
erences a nsformations are made, the dif- 
that, if boue Our reasoning was 
€sts are m avis-Eells and L-T nonverbal 
an the L E a less by cultural status 
Scores with. verbal test, combining ISC 
should raj those of the two former tests 
More na their correlation with ITBS 
Scores, sin combining them with the verbal 
Teady ce the latter, by some arguments, 
did not ea social class loadings. This 
Bids o be the case. It should be 
. Question w authors of the verbal test 
148 factor ere cognizant of the culture 
to mig; er and went to considerable length 
ethape c at 
Ts amet important result from 
sl u relation analysis is the light 
pon the behavior of the tests in- 


not 


119 


TABLE 4 


MULTIPLE CORRELATION COEFFICIENTS, 
'THREE-VARIABLE COMBINATIONS 


N = 344) 
Combination* R Zero Order 
rs 
5.23 .848 51 = .568 
5.12 .846 52 — .839 
5.24 .844 53 = .683 
5.13 .708 54 = .340 
5.34 .695 
5.14 .504 
25 = ITBS Composite 

12 D-EIQ 

2=LTVIQ 

3 - L-T NV IQ 

4=ISC 


TABLE 5 


MULTIPLE CORRELATION COEFFICIENTS AND 
Buta WEIGHTS FOR REGRESSION 


ANALYSES 
(N = 344) 
ITBS Composite Scores 
Multiple 
Rs: (853 | .851 | -851 849 | 711 
ISC .067**|.074**. .076**|.111* 
D-E IQ |.073 .083**|. 117* .200* 
L-T V IQ .687* |.702* .694* |.749* 
LTNV |.127* .161* |.137* .518* 
IQ 
* Significant at .01. 
** Significant at .05. 
r's showed that the 


volved. The zero order 


L-T verbal test gave 
of ITBS scores, followed in order by L-T 


nonverbal and Davis-Eells. The multiple 
correlation analysis showed that L-T ver- 
bal alone correlated with ITBS about as 
well as did the entire battery of tests when 
combined in multiple correlation analysis. 
It may be noted that the deletion of L-T 
verbal scores from the regression analysis 
increased both the significance and mag- 
nitude of Davis-Eells Beta weight. 
Progressive Matrices correlated to 2 
smaller degree with ITBS than did any of 
the other intelligence tests. It may also be 
seen that the L-T verbal test correlated 


the best prediction 


120 LOTUS M. KNIEF AND JAMES B. STROUD 


with every one of the ITBS subtests to a 
greater degree than did any of the other 
intelligence tests. The L-T nonverbal test 
correlated with every one of these subtests 
to a greater degree than did the Davis- 
Eells test. Moreover the latter correlated 
with every one of these subtests to a 
greater degree than did the Progressive 
Matrices. 

The analyses reported in this paper give 
little justification for the use of the Davis- 


Eells Games, L-T nonverbal test, and the 
Progressive Matrices in conjunction with 
the L-T verbal test for general prediction 
purposes. This in no sense denies their use- 
fulness in individual diagnosis. In need of 
further exploration in particular is the 
significance of extreme discrepancies in 
performance on verbal and nonverbal in- 
telligence tests. 


Received October 28, 1958. 


ee 


—" 


"1 


—— TÉRRA 


JovnNaL 
NAL OF Epu š 
Vol. 50, No. $; jus. MR PsvcuoLocY 


SOME BE g 
a ne OF ELEMENTARY SCHOOL CHILDREN 
TED TO CLASSROOM ACTIVITIES 
AND SUBJECT AREAS! 
SURANG KOWATRAKUL 
Chulalongkorn University 


In th 
E rns i of educational psychology 
earning inte stions concerning teaching- 
empirical in raction have been awaiting 
gists agree Bin ape Many psycholo- 
classroom des empirical studies in the 
ety of m - will prove useful in a vari- 
rigorously tt this type of study when 
accomplish p, ormed is rather difficult to 
within the ae of several factors 
More, Nd ee situation. Further- 
on Educati a studies (American Council 
Mdicated uc 1945; Anderson, 1937) have 
niques and at teachers need new tech- 
fion taid — for gathering informa- 
mto their em in obtaining new insights 

ep behavior. 
Would T al psychology, therefore, 
" method. to need reliable instruments 
haviorg ag for measuring students’ be- 
able lita the elassroom as well as reli- 
tion EL may form a solid founda- 
and the q e formulation of new concepts 
am of Ps velopment of new theory. The 
Strument : Study was to develop 'an in- 
Pupils’ bihag systematic observation of 
naturalisto in what Sears has called 
ad to make settings” situation (1957) 
5a Onships 
ie man 

Îvities 


p E 

iy empirical survey of the 

ad ee of student 

exi s stations to three classroom 
our subject areas. 


relati 


1 


Orate ed on a di 
mitted p, dissertation done at the Lab- 
Php. to Santo Development and sub- 
Ment Teauiremen University as part of 
1S mage a Grateful acknowledge- 
RQMidan P", ang n Pauline S. Sears, Quinn 
ce and to a J. McDonald for 
al ra Chuang Kashetra, Thai 


Ounsi 
elor, 4 : : 
? research,” for securing financial 


121 


METHOD 
Definition of Behavior Categories 


Intent on Ongoing Work (IW) or dis- 
cussion, task oriented, performing assigned 
work. This category ineludes active work 
in assigned areas and also collecting, get- 
ting out, organizing, and putting away ma- 
terials necessary for work. The key phrase 
is task oriented. 

Social Work Oriented (SW): This cate- 
gory includes any social remark, inter- 
change, or action which is work oriented. 
For example, à child may initiate social 
contacts in relation to his own work, he 
may respond to another child's work, he 
may explain a task to a peer, he may note 
another's progress oF show his own prog- 
ress, give or get suggestions in regard to 
work, or he may initiate interaction with 
a teacher in regard to à work problem. 
This category is reserved for ihe expres- 
sion of social needs in relation to work 
tasks. 

Social-Friendly (SF) 
eludes any social remark, interchange, oT 
action. Within this category fall mutual 
horseplay, friendly conversation and ges- 
any funny faces or movements 
ntion. The key words 
re social interchange 
ongoing work. 


: This category in- 


tures, and 
to attract others’ atte! 
in this category are pu 
without task orientation to 

Momentary Withdrawal (MW): This 
category includes the behavior which 00- 
curs when an individual momentarily 


censes his present task an 


activity, 
stares at à 
assumes an 


122 


not, apparently, thinking about his work 
or engaging in social activity. 

Intent on Work in Another Academic 
Area (WOA): Tallies fall in this category 
when a student is doing some other aca- 
demic work than that which has been as- 
signed for the present period. 

Intent on Work in Nonacademic Area 
(WNA): This category is checked when a 
student is intent on activity of his own 
which is unrelated to academic work. This 
category also includes all complex play or 
doodling activity which is purposive or 
directed toward a consciously defined goal. 
For example, the student is making a pa- 
per airplane, or cleaning his desk, or ar- 
ranging a string or chain or paper clips 
into a certain figure. 


Definition of Classroom Activities 


Independent Seat Work: In this cate- 
gory, students are engaged in work in- 
dividually. Usually a student does an as- 
signed job which has a clearly defined goal. 

Watching and Listening: This category 
of classroom situation is employed when 
students are supposed to play a passive 
role. Examples: listening to the teacher, 
watching a demonstration, watching an 
educational program on TV. 

Discussion may be defined as symbolic 
interaction among three or more persons. 
Such interaction may be structured by a 
participating formal leader, student or 
teacher; or it may take place within a 
structure prearranged by the teacher with 
a leader appointed by her; or it may be 
at a purely informal level following the 
bent of the participants. Discussion may 
be divided into small and large groups. 


Definition of Subject Areas 


Language is defined as any kind of work 
directed towards acquiring a proficiency 
in a manipulation of symbols (oral or writ- 
ten expression), their structuring, order- 
ing, construction, and pronunciation, Ex- 
amples include spelling, reading recrea- 
tional books, and writing or giving an oral 


SURANG KOWATRAKUL 


report where emphasis is on successful 
communication rather than content. 

Arithmetic is defined as any work con- 
cerned with numbers and their relation- 
ships. 

Social Studies consists of work in human 
relations, civics, current events, history, 
news, and world events. 

Science consists of any study concerned 
with the principles of nature, facts of nat- 
ural phenomena, or methods of the bio- 
logical, physical, and social sciences. 


Sample 


56 children in Grades 5 and 6 of a small 
elementary school were observed. This 
number of Ss includes 15 boys in Grade 5 
and 18 in Grade 6, and 10 girls in the 
former class and 13 in the latter? 


Method of Gathering Data 


This study employed the point-time 
sampling technique in which an observer 
observes an S long enough to record one 
behavior, then passes on to the next S im- 
mediately. The behavior of the next Ls 
must be independent of that of the first 1° 
order to ensure independent behavior 
scores. Observations in this fashion C00- 
tinue until one “behavior point" of eat: 
individual in the group has been recorded. 
Then a new round of observation begins. 
Observation experience indicated that ea¢ 
round took about four to five minutes ID 
the sixth grade class and about three tO 
four minutes in the fifth grade class. 

For efficieney and preciseness in record- 
ing, observation record sheets were PC" 
pared prior to the time of gathering data 
The observer also described in detail o 
nature of the classroom activity at t d 
time of observation. Data were recoróe 


* Although sex differences are likely im 
some of the six behaviors defined above 
analyses separately for the sexes were A 
contemplated because of consequent “ors 
Ns and because the emphasis is on haus 
of a class as a whole. The results, there re^ 
cannot safely be generalized to sex 5€ 
gated classes. 


CLASSROOM BEHAVIORS OF ELEMENTARY SCHOOL CHILDREN 


2 When classes were engaged in the par- 
i ar activities defined above. No record- 
prs made during intervals of tran- 
ie one activity to the next. 
aoe es classroom activities were di- 
ced in which case behavior frequen- 
ae eem omem Data were not col- 
the regular A ubstitute teacher replaced 
morum because this study was a “natu- 
fide a type, the observer tried 
he P aed with the plan of the class. 
borhan "s also attempted to reduce 
subjective Tect which might arise from her 
PA eine of the pupils. Dur- 
avoid sed of collecting data she tried to 
able et with individual pupils. 

ncs oe the mean number of 
toom a us per student under each class- 
ctivity and subject area in each 


Stade, 
Wag is total number of time samples 


Sho 
Coring and Reliability 


Fi 

experiment beyond the control of the 

(or obsery, er the number of time samples 

Students -* tions) was not constant for all 

Percentage ence it was necessary to use à 

Score of © coring system. Thus, eg, a 
a student for the IW category is 


123 


TABLE 1 
Mean NUMBER (PER STUDENT) or TIME 
SAMPLES FOR THREE CLASSROOM 
ACTIVITIES AND Four SUBJECT 
AREAS 


(Ns: Grade 6, 31; Grade 5, 25) 


Subject Area 


Classroom activity | . o "d E 
Z |a| sá E $ 

$|3 | a"? |] 414 

Independent 6 | 65| 60 66 | 84 
Seat Work 5 | 82| 68 63 | 89 
Watching- 6 | 118| 77 61 | 62 
Listening 5 62 | 62 55 |61 
Discussion 6 | 65| 65 64 |51 
5 58 


the number of behaviors judged as IW 
relative to the total number of observa- 
tions on that student under a specified 
condition (such as Arithmetic-Discussion) . 
For each student there were 12 scores (cor- 
responding to the 12 combinations of con- 
ditions) for each of the six behavior cate- 
gories. 

Score reliability was ascertained by cor- 
relating percentage scores for odd-num- 
bered vs. even-numbered time samples. 
Since the 30 computed Spearman-Brown 
coefficients (Table 2) were generally high, 


TABLE 2 
SPEARMAN-BROWN RELIABILITIES OF BEHAVIOR Scores 


Classroom Activity malui du Behavior Categories 

mw | sw | se | Mw | woa | WNA 

Wat Grade 5 
E A Science oso | .986 | .956 | .970 | .804 | .985 
Diet Work Language ost | .953 | .975 | .960 | .984 | .960 
Social Studies | .941 | .928 | .804 | .881 | .982 | -943 

Grade 6 
pat Wonk Arithmetic 957 | .942 | .737 | .969 | .953 | .953 
= Science .943 | .994 | .805 | .905 | -875 | .988 


124 


it seemed unnecessary to determine all of 
the possible 144 reliability coefficients. 
Many of the computed reliabilities are 
sufficiently high to suggest that adequate 
reliability could be attained with fewer ob- 
servations. 

Perhaps, in the light of the given reli- 
abilities, it is not necessary to report that 
the preliminary question of observer re- 
liability received a satisfactory answer. 
Data were collected over a period of five 
days by a trained observer (M. H. Pintler 
of the Laboratory of Human Develop- 
ment) and the writer. These two observers 
simultaneously fixated upon the same stu- 
dent, with identical observation times en- 
sured by the use of signals. Following the 
signal, the behavior of a student was im- 
mediately observed, categorized, and re- 
corded independently by the two observ- 
ers. For a total of 1,689 time samples, with 
data combined for different classroom ac- 
tivities and subject areas, there was 94% 
agreement in the categorizing. This figure 


TABLE 3 
Means ror Intent ON ONaoiNG Work 
(IW) UNDER THREE CLASSROOM 
Activities AND FOUR SUBJECT 
AREAS 


(Grade 6 top half; Grade 5 bottom half) 


IRRE 
2 |a| E 
3 35 E i 3 & ga 
Independent  |76.07,82.64/78.13/74.16/77.75 
Seat Work 
Watching- 73.55|71.77/73.35|76.39|73.77 
Listening 
Discussion 52.42/53.45/52.71/56.94/53.88 
Mean of 
Means....|07.34(69.29/68.06/69.106/68.47 
Independent 82.9281.4879.0877.68/80.29 
Seat Work 
Watching- 82.12,84.80/74.40/79.28/80.15 
istening 
Nimm 52.56/56.00,46.88/53.24/52.12 
n of 
à) 3 ....[T2.47/74.09/66.79|70.07 70.85 


SURANG KOWATRAKUL 


is sufficiently high to indicate that the 
data collected by a single observer could 
be replicated by another observer. 


ANALYSES OF Data AND FINDINGS 


It was stated in the presentation of the 
method of gathering data that the fre- 
quencies for the occurrence of each type 
of behavior (Intent on Ongoing Work, So- 
cial Work, Social-Friendly, Momentary 
Withdrawal, Intent on Work in Another 
Academic Area, Intent on Work in Non- 
academic Area) of each child were col- 
lected under each condition of three class- 
room activities (Independent Seat Work, 
Watching-Listening, and Discussion) and 
four subject areas (Science, Social Studies, 
Arithmetic, and Language). The design 15 
such that tests of significance involve ? 
three-way analysis of variance, separately 
for each grade, with rows representing in- 
dividuals, columns representing the four 
subject areas, and blocks representing 
three kinds of classroom activities. This 
type of analysis was made for each de- 
pendent variable except Intent on Work 
in Another Academic Area for which the 
distributions were highly skewed. 

With I standing for Individuals, S for 
Subject Areas, and A for Classroom AC- 
tivity, the main effect for S was teste 
against the I x S interaction, the A ef- 
fect against the I x A interaction, and the 
S x A interaction against the I x 8 X 
interaction. Because of space limitations 
the usual analysis of variance tables, 10 1” 
number, will not be presented. Instead) 
tables of means will be given, with the re- 
sults for Grades 5 and 6 in the top 2M 
bottom halves, respectively. The statist!- 
cal significance of the comparisons wi 
next be briefly summarized. 

For Intent on Ongoing Work (see Table 
3) the Activities (A) effect is significant 
beyond the .0005 level for both grade 
groups, with markedly lower IW during 
Discussion. The Subject Area (S) effect 15 
significant (.0005 level) for Grade 5, but 
insignificant for Grade 6. The A x S8 ™ 


CLASSROOM BEHAVIORS OF ELEMENTARY SCHOOL CHILDREN 


125 


TABLE 4 TABLE 5 
Means For SoctaL Worx (SW) Means For Socrau-Frienpiy (SF) 
2 [s zi HEB: - z|$ &\se 
5 |22\2 s 5 E 
Independent | 8 “|= - pes 
.87| 6.94| 7.29| 5.55) 7. 
set Work 9| 5.55) 7-16 Independent Seat (6.13) 4.71|5.55|8.81|6.30 
tching- 7.36| 3.68, 9.19| 7.19] 6. ork 
p Listening 9.19| 7.19080 — Watching-Listen- (5.10 5.745.134.108.08 
iscussion  [p2.10/12.7498.613.9921.78 Discussion 6.52/10.97/5.07 5.35 0.9 
.82]10.97/5.07|5.35 6.98 
Mean of 
Means....|12.80| 7.78|15.03)12.04|11.91 on 0f nal zalea bosa 
Tout 94| 7.14/5.25/6.09 6.10 
Independen 
t | 9.20 10.44[10.20| 9.84| 9. 
Seat Work 9,84 9.92 Independent Seat 5.60] 2.84/2.28|4.20/2.98 
tehing- — |6.08|4 p E 
Listening 8817.82 9.92 9.75 Watching-Listen- [2-84] 2.443.081.722.67 
cussion  1.2429.249.9082.0089.29 Discussion 3.48| 4.76/5.32/2.40/3.99 
ioni .48| 4.76/5.32 2.403. 
Means...../i5.7113.s83.e0]17.25]17.08 — Mon 2.97! 3.35.70 2.77 5.21 
teraeti 
P reached the .005 level for Grade TABLE 6 
' Fo was not significant for Grade 5. MEANS FOR MOMENTARY WITHDRAWAL 
e ta SW variable (Table 4) both (MW) 
highly sj ects and their interaction are ala | $ las 
oth significant (beyond .0005 level) for s |3| B | 2 | es 
grade groups. High SW occurs con- 3 $^ EE a 


siste; y s 
ently (despite the A X S interaction) 


du $ 

ay Distnasan, but the high SW for 
tudies ic compared to low for Social 
needs to be qualified because of the 

the interaction, 
yid ded Friendly behavior (Table 5) 
Yond the num A x S interaction (be- 
insigniti. 1 level for both grades), with 
05 leva] S effect for Grade 5, and only 
effect E significance for Grade 6. The 
evel for i borderline in significance (.03 
are T grades). The main effects 
ls ont owed by the interactive effect. 
e 6) is hea ey ys effect for MW (Ta- 
(0005 level). interaction for Grade 


or P 

academi a Hor defined as Work in Non- 
for e Fran (Table 7), all three effects 
Syond 4 1 grade group were significant 
jnd the A -001 level as were also the A 


e 
4 X S effects for Grade 5, with 


Independent Seat 4.77/3.07|4.68|6.71/4.81 
Work 

Watching-Listen- 
ing 

Discussion 


5.136.484.453 .90/4.99 
4.076.71/4.13|4.58/4.87 


Mean of Means. . 4.66 5.424.42/5.06 4.89 


Independent Seat 9.042.00 MB 2.93 


Work 

Watching-Listen- |2.28 2.20 ex 2.20 
ing 

Discussion 2.723 .56 2.244.060 3.28 


Mean of Means. . 2.68/2.59 a sp 2.80 


the S effect for the latter group reaching 


only the .03 level. 

As mentioned earlier the distributions 
of the scores of Intent on Work in Another 
Academie Area (WOA) were highly 
skewed. Cochran's Q formula (MeNemar, 


126 


TABLE 7 
Means ror INTENT on Work IN Non- 
ACADEMIC AREA (WNA) 


a2 a Pe 
3 |4 ^ 3|À3|3 
Independent 3.10| 2.36/2.77/4.03/3.07 
Seat Work 
Watching- 7.16) 8.13/5.87/5.45/6.65 
Listening 
Discussion 11.45113.06/8.00/6.87/9.86 
Mean of Means.| 7.24| 7.85/5.57/5.45/6.53 
Independent 1.16) 2.12/2.643.682.40 
Seat Work 
Watching- 4.40| 3.96/2.24/3.72/3.58 
Listening 
Discussion 8.04| 7.402.2415.64/5.83 
Mean of Means.| 4.53| 4,49 2.37/4.35|3.94 


Eee Slip] nies ha^ iai 


p. 232) was employed to analyze these 
Scores. Cochran's technique as here used 
is an extension of the nonparametric “me- 
dian” test to the situation involving cor- 
related scores. This method does not per- 
mit a test of the A x S interaction. The S 
effect was significant at the .01 and .02 
levels for the sixth and fifth grades, re- 
spectively. The A effect was significant at 
the .001 level for Grade 6, but insignificant 
for Grade 5. The percentages exceeding the 
appropriate medians are given in Table 8, 


TABLE 8 
INTENT ON Work IN ANoTHER ACADEMIC 
AREA (WOA)—PzncENTAGES ExcEEp- 
ING MEDIAN VALUES 


Grade 
Conditions ——À— 

6 5 
Science 55 44 
Social Studies 65 64 
Arithmetic 29 24 
Language 52 56 
Independent Seat Work 23 36 
Watching-Listening 65 56 
Discussion 55 52 


SURANG KOWATRAKUL 


from which it will be noted that there was 
less WOA during Arithmetic and during 
Independent Seat Work. 

No comparisons were made for any of 
the behavior scores between the two grades 
because such differences as were obtained 
are possibly confounded with a teacher 
effect. Incidentally, the magnitudes of the 
over-all means in Tables 3-7 are indicative 
of the relative amounts of the several cate- 
gories of behavior. Thus about 70% of the 
behavior during the observation times was 
Intent on Ongoing Work, with about 12% 
and 18% for the sixth and fifth grades, re- 
spectively, being Social Work. The per- 
centages for the other categories are rela- 
tively small and homogeneous. 


Discussion AND INTERPRETATION 


We may conveniently group the si 
students’ behavior manifestations into os 
categories with respect to the kind O 
teacher response they evoke: teacher-ap- 
proved and teacher-disapproved. In heii 
first of these groups fall the behest 
which were called Intent on Ongoing Wor 
and Social Work. These two types of be- 
havior were reinforced by teachers who 
often gave verbal approval to students 
who manifested them. The second group» 
Teacher-Disapproved behavior, includes 
the Social-Friendly, Intent on Work in An 
other Academic Area, Intent on Work ye 
Nonacademie Area, and, perhaps, Momen 
tary Withdrawal. As indicated in the p 
previous paragraph, over 80% of the ° 3 
Served behavior fell in the teacher-aP 
proved category. 


Effect of Classroom Activities 


Does the particular nature of each a. 
Toom activity influence behavior? If "i 
compares the means of the means of Ine 
on Ongoing Work behavior for classroo i 
activities, it will be seen that the highe? 
means are for Independent Seat Work i 
Watching-Listening for both classes. 
would seem that Discussion activity is i à 
conducive to this type of desirable 


iJ 


CLASSROOM BEHAVIORS OF ELEMENTARY SCHOOL CHILDREN 


etait ia behavior, whereas Dis- 
ior Bh favorable to Social Work behav- 
SD e s The high scores in Intent 
Sea “ees Work behavior in Independent 
Be and Watching-Listening can be 
[ud o the nature of the activities 
cities x In the aetivity called Inde- 
rcd ondes usually there were 
sped a individually intensive work. 
which m ple, this activity involves tasks 
Sins n be completed within a certain 
3d ere im research for individual reports, 
"s E ive writing—on which teachers 
lesen their evaluations. The high 
ity es “ D the Watching-Listening ac- 
this Pm > explained by the nature of 
Students o activity and tasks: usually 
lten due: required to take notes and 
the em e ully. The teacher often asked 
tivity. A era at the end of the ac- 
quently dee Watching-Listening fre- 
pie ae students’ interest because 
Pictures “sidan as exemplified by motion 
Speakers, €vision productions, and guest 
Em š 
Peete higher Social Work means 
this eotivity . are related to the purpose of 
or i lee satisfies a social need 
of i ue nal interaction and exchange 
f 3 
(Table 75 amine the means of the means 
in Nonag A the category Intent on Work 
tivities om emic Area for classroom ac- 
Maxim, e see that in both classes the 
bris of the means of the 
1 5 might e Discussion activity. The 
ity: that th seem to indicate the possibil- 
Istios. Bt Discussion has two character- 
"os F Sg who have more self- 
Bü apport cussion period may 
alle nity to participate ac- 
E and exchanging ideas; 


mo k 

Y lon E Self-conscious students the 

"etching à NM 
n 


Were 


Mea 
Nos Were 


Droy; 


127 


the means of Intent on Work in Non- 
academie Area in Watching-Listening and 
Independent Seat Work activities were 
significant for both grades, with higher 
scores found in Watching-Listening ac- 
tivity. This finding again can be explained 
in terms of our previous analysis of the 
nature of the Watching-Listening activity, 
ie. during the period students were not 
taking notes, they could engage in other 
activities such as playing with their pen- 
cils, erasers, or paper clips. 

Any conjecture about the relatively 
small effect of Activity on the Social- 
Friendly behavior will need qualification 
because of the marked A x S interaction. 
It is not surprising that Independent Seat 
Work provides an opportunity for more 
Work in Another Academic Area. 


Behavior Related to Subject Areas 


For Grade 6, the over-all test for the 
main effect of Subject Area on the be- 
havior Intent on Ongoing Work was in- 
significant, but in Grade 5 Subject Area 
had an effect. This contrast may reflect 
either teacher differences or a true grade 
effect. 

For both grades the highest value of the 
mean of the means of Social Work be- 
havior was for Arithmetic (Table 4), with 
the lowest in Social Studies. These results 
may derive in part from the relatively high 
degree of questioning both by the teachers 
and by students requesting help, during 
Arithmetic with fewer questions during 
Social Studies. It should be noted that it 
was not feasible to include in the research 
design the classroom activity, working in 
small groups, which was used much during 
Social Studies and which would be con- 
ducive to SW. Any further speculating 
about the effect of Subject Area on SW 
should consider the fact of A x S inter- 
action. 

Since the main effect of Subject Areas 
on Social-Friendly behavior was insignifi- 
cant for Grade 5 and reached only the .05 
level for Grade 6, and since the AxS 


128 


interaction was highly signifieant for both 
grades, any interpretation of the difference 
between the means of the means becomes 
precarious. Note in the top half of Table 
5 that under Social Studies we have the 
highest mean (10.97) of the row for Dis- 
cussion but the lowest (4.71) in the row 
for Independent Seat Work. Similarly, 
note the 4.20 and 1.72 in the bottom half 
of the table—Language leads to most SF 
in one, and least in another, Activity. 

With regard to the differences in Intent 
on Work in Nonacademie Area associated 
with Subject Areas, it is interesting to note 
that the lowest value of the mean of the 
means for Grade 5 was under Arithmetic 
while in Grade 6 the means of the means 
were lowest for Arithmetic and Language. 
Once again, the presence of A X S inter- 
action cannot be ignored when considering 
the S effect. 

The effect of Subject area on WOA (see 
Table 8) indicates less WOA under Arith- 
metic. Apparently, Arithmetic was handled 
by these teachers so as to lead to less 
teacher-disapproved behavior, 


Summary 


This study was an investigation of the 
relationships of six behaviors manifested 
by 56 elementary school children (fifth 
and sixth graders) to three classroom ac- 
tivities and four subject areas. The find- 
ings demonstrated that classroom activities 
and subject areas and some of the behavior 


SURANG KOWATRAKUL 


categories observed are significantly Te- 
lated. 

From the findings of this study one e 
not legitimately infer that the behavior o! 
all sixth and fifth graders will replicate 
the patterns uncovered here. This study 
did not have as its purpose any attempt : 
draw generalizations for a. population pus 
rather hoped to unfold some suggested E. 
lationships among classroom activity, su 4 
ject area, and students’ behavior em 
festations which may on more € 
investigation prove useful in the "'- 
tion of a consistent theoretical ww 

A secondary purpose for this study o 
that of developing a reliable instrumen i 
measurement, of children’s behavior W i 
might be applied in classrooms. This W 
accomplished. 


REFERENCES p* 
AMERICAN COUNCIL on EDUCATION, D m 
on Child Development and Teache 


and 
sonnel. Helping teachers unde 
children. Washington, D. C.: 
1945. jon 


Anberson, H. H. Domination and integri e 
and the social behavior of youne io . 
dren in an experimental play 2 1- 
Genet. psychol. Monogr., 1937, 19» 

408 


e MAC: ew 
McNemar, Q. Psychological statistics. N 
York: Wiley, 1955. . sation of 
Sears, P. S. Problems in the investiga! ation: 
achievement and self-esteem motiv. mpo” 
In M. R. Jones (Ed.), Nebraska Hiver. 
sium on motivation. Lincoln: 
Nebraska Press, 1957. Pp. 265-339. 


Received November 22, 1958. 


Po 


- 


JOURNAL or E 
EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 3, 1959. in 


REFUSALS AND ILLEGIBILITIES IN THE SPELLING ERRORS 
OF MALADJUSTED CHILDREN! 


HARRY H. L. KITANO 


University of California, Los Angeles 


Ay in subject matter areas, such 
Ce ia and spelling, have been a major 
dnd omi Feria For example, Gates 
share (1940) have attempted to 
sible ea ze emotional difficulties as pos- 
$3 Dm for reading and spelling fail- 
jüstment- S (1958) has shown that ad- 
emotion i ass children (also known as 
m did. disturbed and behavior-prob- 
Tegular. cler Score significantly lower than 
tS, ana i children in reading, arithme- 
Purpose of me subject matter areas. The 
itative ‘in dis paper is to study the quan- 

We qualitative errors in spelling 
d gular- and adjustment-class 


Sgically, 


sais childre it appears that adjustment- 


ifferent, ran make more errors 
Tegular. oo. ypes of errors than their 
Oor adjust; counterparts. Their overall 
Siment in school (Bower, 1958), 


Tea m 
Other Sm out (Bower, 1958), and 
in achiey actors indicate poorer spell- 


Anxiet: “Ae og These, plus their higher 
omer tgo. Tigidity (Kitano, 1958) and 
he follow; Controls (Kitano, 1958) suggest 
Me trelass & hypotheses: (a) that adjust- 
N regu A Will be poorer spellers 
E ai ass children; (b) that ad- 
mis aia will make different 
Nan regular-class children. 


Deis 
Adjust, Sample, and Delimitations 
ior. Si classes are set up for be- 
i ed children in the San Fran- 
chool District. The children 


Tefi 
i erred to as the “emotionally 


jS Tese 
lang able amas made possible through 
lara MP8rvisor Mir ion of Margaret Hol- 
Rats tence’: Principale entary Guidance; 
hogy thers of ipal; and the adjustment 
istrict, the San Francisco Unified 


129 


disturbed”; the common underlying symp- 
tom is the inability to get along in a regu- 
lar class assignment. The classes are set 
up for children who have normal intellec- 
tual potential. 

The regular-class children in the study 
were drawn from one school in a lower- 
middle class area. This roughly corre- 
sponds to the socioeconomic status of the 
adjustment-class group as reported in a 
previous study (Kitano, 1958). The princi- 
pal of this school describes her children as 
being from “fair to poor” in spelling 
achievement and reports the mean IQ to 
be around 95. This again roughly corre- 
sponds to the IQ level of the adjustment- 
class group (Kitano, 1958). Therefore, an 
attempt was made to match the groups in 
regard to socioeconomic status and IQ 
through this selection procedure. 

The sample consisted of all the fourth, 
fifth, and sixth grade children from the 
regular-class group of one school and all 
the fourth, fifth, and sixth grade children 
in the adjustment classes. The number in 
each group was 88; the sample consisted 
of 176 children. 

The spelling words were chosen from 
Gates’ A List of Spelling Difficulties in 
3876 Words (1937). Twenty words classi- 
fied by Gates as being between the 4.6 and 
4.9 grade levels were chosen as it was be- 
lieved that these words would be familiar, 
yet difficult enough for fourth, fifth, and 
sixth grade children. 

In a paper devoted to spelling errors, 
the delimitation of the method of classifica- 
tion of such errors is of utmost importance. 
Spache (1940) reviews the various classifi- 
cations and suggests various criteria which 
must be met in order to analyze spelling 
errors. The method of classification used in 


130 


this research is a combination of the 
Spache (1940) and Gates and Russell 
(1940) classifications, modified for the ex- 
ploration of the hypotheses under investi- 
gation. The six types of errors and classi- 
fications are listed below: 

Additions and insertions, such as sticke 

for stick 

Omissions, such as fether for feather 

Phonetic errors, such wate for wait 

Substitutions and reversals 

Words refused or not completed 

Unrecognizable 


Procedure 


All testing was done in December of 
1957. The writer gave the spelling test to 
the regular-class children; the individual 
adjustment-class teachers gave the test to 
their own classes. 

Coding of the spelling errors was done 
by the experimenter. 

A comparison of the type of errors was 
made by group, and statistical significance 
was tested through chi square. 


RESULTS 


The summary of findings is presented in 
Table 1. The regular-class children made 
significantly more errors in additions, pho- 


TABLE 1 
COMPARISON AND SIGNIFICANCE OF TOTAL 
ERRORS By GROUPS IN ADDITIONS, 
Omissions, Puonics, SunsTITU- 
TIONS, Worps REFUSED, AND 
UNRECOGNIZABLE SPELLING 


Adjust | Remular-| 4. 
Class Qus e| P 
Errors TTOIS 
Additions 70 110 8.8.02 
Omissions 80 110 4.7|.10 
Phonics 70 114 11.5|.01* 
Substitutions 161 342 65.0|.01* 
Words Refused 616 155 |9756.4|.01* 
Unrecognizable | 312 155 52.9|.01* 
Cases 88 88 
Spelling 20 20 
Words 


* Significant at .01 level. 


HARRY H. L. KITANO 


netics, and substitutions. The adjustment- 
class children made significantly more er- 
rors through refusals and unrecognizable 
spelling. There was no significant difference 
between the groups in omissions. k 

The regular-class children scored sig- 
nificantly higher on the number of var 
correct. Of the total number of attempte 
and legible words, the adjustment-class 
children missed 52.7%; the regular-class 
children 17.6%. 


Discussion 


Causes of spelling errors have been ei 
sified into three broad categories by Ws 
(1936), and Gates and Russell ee ii 
The errors include (a) deficiencies her 
the pupil, (b) difficulties inherent in | "i 
English language, and (c) inappropr? 
methods of teaching. jes 

The errors under the first two catego” re 
were explored in the present study. es 
was no attempt to control or analy 
teaching methods. . addi" 

It is possible to think of errors in EOS 
tions, omissions, phoneties, and subs 
tions as being errors primarily due tO ex- 
difficulty of the English language, C i 
ternal errors. It is of interest to note axe 
the regular-class children made more pile 
ternal errors than adjustment-class © 
dren. itbin 
The type a errors, or difficulties Mrs 
the pupil, manifest themselves in €? |, 
due to "refusals" and "unrecogni s 
spelling.” Something within the ^ try 
makes it impossible for the child tO me 
writing the word or to come up with 59 


pe 
thing completely unrecognizable whe? 


e 
does write. It is suggested that the high E 
anxiety of the adjustment-class child als" 
tano, 1958) is one reason for the “refus? q, 
and the unrecognizable spelling. The 
ings appear to confirm an earlier study 4, 
the author (Kitano, 1958) in whic av 
justment-class children were found t0 - dit 
significantly higher anxiety and ng er. 
scores than regular-class children. HE d 
anxious and higher rigid children 97^ e- 
be characterized by a refusal to try 5? 


SPELLING ERRORS OF MALADJUSTED CHILDREN 


Mis new—in this case spelling words— 
y e children with less anxiety and rigid- 
i " Should feel relatively free to try. 

Tute the high anxiety blocks learn- 
Ix : : i gh rigidity channels the anxiety 
em bk En to attempt or try words, and 
Qum - as and the refusal to try evidently 
rou c anxiety. This vicious circle in 
Me ¢ subject matter area is no doubt 
iis. and over again in school and 
(men pro zabi cause for much of the 
ahi iy icri by adjustment-class 
gressive rec extremely hostile, ag- 


x cl oai that the role of a teacher 
vinto s 4 P ment, class is to somehow al- 
aps by ri e of the excessive anxiety, per- 
matter Sa praise to those who try, no 
od we | the result, and to minimize 
T ^ ltiveness of the normal spelling 
reduced Ceess at a lower level, such as 
Spelling load, longer time intervals 

mng, and the minimization of the 


for lear; 
Orma]i i 
zed spelling test can be of pos- 


131 


sible value towards alleviating some of the 
anxiety producing situations. 


REFERENCES 


Bower, E. M. A process for early identifica- 
tion of emotionally disturbed children. 
Sacramento, Calif.: California State De- 
partment of Education, 1958. 

Core, L. The elementary school subjects. 
New York: Rinehart, 1936. 

Gares, A. I. The psychology of reading and 
spelling with special reference to dis- 
ability. New York: Columbia Univer. 
Teachers Coll., 1922. 

Gates, A. I. A list of spelling difficulties in 
$876 words. New York: Columbia Uni- 
ver. Teachers Coll., 1937. 

Gates, A. I, & Russert, D. Diagnostic and 
remedial spelling manual. New York: 
Columbia Univer. Teachers Coll., 1940. 

Krrano, H. Anziety and rigidity in adjust- 
ment-class children. Unpublished doc- 
toral dissertation, Univer. of California, 
Berkeley, 1958. 

Spacue, G. A critical analysis of various 
methods of classifying spelling errors. 
J. educ. Psychol., 1940, 31, 111-134. 


Received November 28, 1968. 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 3, 1959 


PERSONALITY AND INDEPENDENT STUDY’ 


KATHRYN KOENIG 


University 


Does it make any difference how we 
teach? Despite a number of carefully 
executed studies on the comparative ef- 
fectiveness of various teaching methods, 
there is little evidence to support the view 
that one teaching method is more effec- 
tive than any other. 

At the moment, our University of Mich- 
igan project is most vitally concerned with 
one explanation of the finding that there 
is little difference in the performance of 
students taught by one method as com- 
pared to other methods. Our assumption 
is this: "Any teaching method is effective 
only for certain students; when we com- 
pare teaching methods on group measures, 
the effects upon different students cancel 
each other out." 

In this study we hypothesized that the 
highly independent students would prefer 
learning, perform better, and be more in- 
volved in an independent study situation. 
Likewise, we hypothesized that students 
with high need for affiliation would prefer, 
perform better, and be more involved in 
small group diseussions than would other 
students. Finally, we hypothesized that 
students high in n Achievement would do 
well in independent study. 


DESIGN OF THE EXPERIMENT 


The experimental groups consisted of 
124 students (89 females, 35 males) who 
were enrolled in a single lecture section of 
an elementary psychology course which 
met twice weekly. The students also met 


1This study was part of a larger study 
supported by a grant from the Ford Fund 
for Advancement of Education. Substantial 
portions of this paper were presented in a 
symposium, “Experimental Studies in Learn- 
ing Independently,” at the 1958 APA meet- 
ings. 


AND W. J. McKEACHIE 
of Michigan 


twice weekly in six discussion sections of 
about 20 students each? 

In order to test our hypotheses we E 
nipulated the teaching methods in be. 
sections experimentally so that during Ke 
semester each student would participa i 
in small group discussions and independent 
study in lieu of the regular discussio" 
classes. In order to achieve this, the res" d 
lar discussion sections were discontinue 
for two weeks and each student in one = 
of the class wrote a paper independent? 
while students in the other half " 
small groups. Later in the semester, th 
experimental procedure was repeated ne 
the groups reversed so that each stu x 
participated in both variations of teat 
method. 

Students in the lecture group took f 
following measures of personality: pich 
California Psychological Inventory, TE 
gave scores on 18 traits; Stott’s invento 
Every-day Life, which yielded three Tie 
ures of self-reliance; and finally the d by 
matic Apperception Test, as modifie d fof 
McClelland, which was scored for Nee eat 
Achievement, Need for Power, and je 
for Affiliation. In addition, observe up 
corded the interaction in the small £ re- 
discussions, using our modified Bales 
cording system? 8) 

Menai of the students’ prefere 
involvement, and performance were inf 
ministered after each variation of ien ol! 
and at the end of the semester. SUC he 
were asked to rate the importance ke of 
teaching method and the effectivene? die 
the method. In addition the student 


z ea 
* The cooperation of the discussio” legd 
ers, Myron Braunstein, Nathan Bro ds 
Sarah Curtis is greatly appreciated. natio? 
* Richard Mann devised the catego 
system used and trained our observe" 
132 


PERSONALITY AND INDEPENDENT STUDY 


me Which method he preferred and how 

eura po in that portion of the 

den i instructors evaluated the stu- 

the performance in the small groups, on 
paper, and in the total course. 


Resutts AND DISCUSSION 


reed previous studies had indicated that 
ia dys ts may occur for men and 
for a ie v dn were done separately 
Variables mu n general, the personality 
i ans n used as independent vari- 
servation $ the ratings, grades, and ob- 
iion] ecords were used as dependent 
T 
iene ues of the data was dis- 
Were not a i two major hypotheses 
reliance or 1n T ed. The measures of self- 
© safisfact n Affiliation were not related 
ment in th lon, performance, or involve- 
Ex i experimental groups. 
Tesultg can ae at n Achievement, our 
ough — ittle more encouraging. Al- 
Nees for m were no significant differ- 
Preferreq "i high n Achievement women 
Stroup ad e innovations (both small 
lectures independent study sessions) to 
Women, o he middle n Achievement 
tures, Mean other hand, preferred lec- 
middle n Ade studies have indicated that 
e of Dues evement people are high in 
more th and thus may wish to avoid 
and g ee independent study 
2 A. up procedures because they 
Hi Skills which they are not sure 
n Achiey, 
ds nd, m ement women, on the 
Te challen này find these innovations 
ns, 9DB than the usual lecture 


erences. High n Power 
bns ue less in the small groups 
OF this ro} ower. However, the direc- 

ationship is reversed with 


CA 
than di Participa 
tion y 
Men, 

Th 
zlin, trait 1 
hg. Tt is Ss Dexibility is even more puz- 
*d to participation in the 


133 


TABLE 1 
RELATIONSHIP BETWEEN Women’s NEED 
FOR ACHIEVEMENT AND TEACHING 
METHOD PREFERENCES 


Preference for Method 


Need 
Achievement 
(Women) Teue Small Inde- 
Group | pendent 
High 2 8 5 
Mid 16 4 3^ 
Low 7 7 65 


? Chi square = 12.6. 
b Probability = .05. 


TABLE 2 
RELATIONSHIP BETWEEN Women’s NEED 
FOR POWER AND AMOUNT OF PARTICI- 
PATION IN SMALL GROUP DISCUSSIONS 


Ked Power Amount of Participation 
(Women) 


ee 


High Low 
High 9 Iys 
Mid T 16^ 
Low 14 7 
® Chi square = 7.1. 
b Probability = .05. 
TABLE 3 


A COMPARISON OF THE PARTICIPATION OF 
Hich Nesp POWER Men AND HIGH 


Neep Power WOMEN IN THE 
SMALL GROUPS 


Amount of Participation 


High Need Power —<—<$—$<—<$—$=$ 
High Low 
Men 10 28 
Women 10 23^ 


® Chi square = 10.0. 
b Probability = .001. 


small groups. This CPI scale has a high 
negative correlation with the California F 
(Authoritarian) Scale, and it is not sur- 
prising that the flexible nonauthoritarian 
individual should participate freely in the 
rather permissive small groups. This is 
what we find for women. For men, how- 


184 


ever, this scale predicts in the opposite 
direction. Highly flexible men are signifi- 
cantly less likely to participate than men 
low in flexibility. 

Despite the number of negative results, 
this study provides some encouragement 
for continuing examination of individual 
differences with teaching methods. Our 
measures of need for power and of flexi- 
bility do predict participation in permis- 
sive small groups, albeit in odd ways. We 
once again found evidence that students 
who fear failure prefer familiar well-struc- 
tured situations such as lectures, and we 
once again found sex differences even 
though we cannot explain them. 

Some additional results maintain our 
feeling that programs of independent study 
need to take account of personality. Carle- 
ton College also participated in the Ford 
Fund studies of independent work. 64 of 
their seniors took one of several independ- 
ent study courses last fall semester. There 
were no control groups, but the students 
did take the CPI, Allport-Vernon, and a 
scale developed at Michigan to measure 
students’ conception of the teacher's role. 

While the Allport-Vernon and CPI failed 
to prediet performance in independent 
Study, our Michigan scale did. Students 
who think the instructor should be au- 
thoritarian tended to do poorly in inde- 
pendent study. However, this measure does 
not relate signifieantly to over-all grades 
in other college courses or to intelligence. 


Summary AND CONCLUSIONS 


Each student of an introductory psy- 
chology course participated in small group 
discussions, independent study, and regular 
lecture - discussion sections. Personality 
data and measures of the students' prefer- 
ences and performance under each method 
were collected. Neither the measures of 
self-reliance nor of n Affiliation were re- 
lated to satisfaction, performance, or in- 
volvement in the small group discussions, 

1 Jean Calloway directed the Carleton Col- 
lege study. Rae and Steven Kaplan carried 
out this analysis. 


KATHRYN KOENIG AND W. J. McKEACHIE 


or independent study. However, high n 
Achievement women preferred the two 
innovations to lectures, while middle n 
Achievement women preferred the lecture 
method. Additional findings were that high 
n Power women partieipated less in the 
small groups than low n Power women; 
whereas the direction of the relationship 
was reversed for men. Similarly, the non- 
authoritarian women participated more 1 
the small groups, while the nonauthor!- 
tarian man was less likely to participate 
in the permissive discussions. 

A second study at Carleton College 
showed that students who think the m- 
structor should be authoritarian tended to 
do poorly in independent study. fhe 

Although the present study utilizing t 2 
approach of method-individual po d 
tions does not represent a major Lo 
through, our ability tentatively to one 
negative results coupled with the positive 
results we have so far achieved are nx 
couraging. The teaching-learning Br 
is a very complex one—one which can as 
mastered only through research be X 
permitting the study of teacher-grouP rot 
teractions as well as method-student inte 
actions. Perhaps there is some virtue In ulti- 
current college enrollment boom, for m! er 
variate designs require Ns much larg 
than any we have used to date. 

Even with more definitive results, i 
ever, we would not conclude that suce be 
with certain types of personality shoul a 
excluded from independent study oT 97 al 
group discussions. As we see it, OUT ©. 
should be for all students to learn tO ns 
independently and to participate respo 
bly in small groups. Rather than en 
students who dislike independence OF "y 
in small groups from these classes, W° | pt- 
want to give them special training 2° to 
tention in order to help them learn owl 
learn in these situations. Increased ould 
edge about student personalities © thes? 
give us increased ability to achiev® 
goals. 


how- 


Received December 2, 1958. 


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 


Volume 50 


August, 1959 


Number 4 


I ————"— — 2.0 — — M —— M 


THE PREDICTION OF COLLEGE GRADES FROM THE 
CALIFORNIA PSYCHOLOGICAL INVENTORY AND 
THE SCHOLASTIC APTITUDE TEST' 

JOHN L. HOLLAND 


National Merit Scholarship Corporation, Evanston, Illinois 


Sow ng us designed to explore the use- 
ing colle nonintellectual factors in predict- 
intormatin rue and to provide needed 
of intellect, T the development of a theory 
Teport Dow achievement. Since an earlier 
school o that the SAT and high 
Predicting 4] have only low efficiency for 
Sample xn he grades of a high aptitude 

olland tine a narrow range of talent 
formed to rg the present study was per- 
and SAT pat the usefulness of the CPI 
aS pre, Hide ct ot and in combination, 
4 sample of J scholastic achievement for 
Teshmen zu rn talented college 
Versities, Th ending 291 colleges and uni- 
a test of te results of this study provide 
Mstrument, * predictive validities of these 
tested p ^ Since the student sample was 
Prior to college entrance. 


The st Sruvenr SAMPLE 
ude: 

y lier i Sample was obtained from 

th tional Me Y (Holland, in press) in which 

ie © Tit finalists were tested with 


a 
1957, The qeonth before the fall term of 
"i had been administered to 
i mple about seven months be- 


Te, 8 sti 
onteh va was 
tion dàtion 3 


partially supported by 
i fnd ug the National Science 
contrip * author e Old Dominion Founda- 
ang p Pons of Wishes to acknowledge the 

aura Ken, Donald L. Thistlethwaite 
Paper, “Nt for their critical reviews of 


fore the fall term. The sample included 743 
Merit Scholars and 578 Certificate of Merit 
winners drawn from a sample of 7500 final- 
ists, the survivors of a nationwide compe- 
tition in which 166,000 high school seniors 
participated (National Merit Scholarship 
Corporation, 1957). Because of their over- 
all psychometric and demographic similar- 
ity, Merit Scholars and Certificate of Merit 
winners were combined, and the total sam- 
ple was sorted randomly into two approxi- 
mately equal subgroups. Boys and girls 
were analyzed separately in both sub- 
groups. The first subgroup (standard sam- 
ple) was used to develop regression equa- 
tions for each sex, and the second subgroup 
(cross-validation sample) was used for 
cross-validation. The Ns, means, and stand- 
ard deviations for each sample on the CPI, 
SAT, and HPR variables are shown in Ta- 
ble 1. 

Table 1 indicates that the standard and 
cross-validation samples for both sexes are 
comparable with respect to CPI, SAT, and 
HPR measures. Table 1 also shows the high 
aptitude levels of these students and the 
accompanying restriction in range. The 
male samples average 670 and 710 on the 
Verbal and Mathematical sections of the 
SAT. The female samples average 682 and 
654 on these same variables. The corre- 
deviations for these vari- 


sponding standard 
than 


ables range from three fourths to less 


135 


136 


JOHN L. HOLLAND 


TABLE 1 


(HE STANDARD AND CROSS-VALIDATION SAMPLES IN Terms or SCHOLASTIC APTITUDE 
Test, CALIFORNIA PSYCHOLOGICAL INVENTORY, AND Honor Pornt Ratio 


Boys Girls 
Standard Cross-validation| Standard Cross-validation 
Scale N = 476) (N = 481) (N = 185) (N = 179 
x sp | x | sD x | SD x | SD 
a 
SAT-Verbal (SAT-V) 670.8 | 45.3 | 668.7 | 44.2 | 678.5 | 45.4 | 684.9 | 40-9 
SAT-Math (SAT-M) 711.5 | 72.7 | 709.1 67.3 | 649.2 | 74.3. | 657.9 | 81-4 
Dominance (Do) 31.0| 5.9 | 31.1] 5.8 | 30.5| 5.4 | 30.6] 5-8 
Capacity for Status (Cs) | 22:2| 3.3 | 22.3| 3.1 | 23.2| 3.2 | 233.1 32 
Sociability (Sy) 26.7| 4.9 | 26.4) 4.8 | 26.7| 4.5 | 26.0 | 5-4 
Social Presence (Sp) 37.8| 6.2 | 37.8| 5.7 | 36.7] 5.3 | 36.1| 5-7 
Self-acceptance (Sa) 23.2| 3.9 | 22.9| 3.9 | 225| 3.6 | 22.4] 3-8 
Sense of Well-being (Wb) | 37.6| 3.6 | 37.4] 3.9 | 37.6| 3.7 | 36.8] 4: 
Responsibility (Re) 34.1| 4.2 | 341| 4.0 | 35.5| 41 | 35.5, 3-6 
Socialization (So) 38.7| 5.2 | 39.0| 5.2 | 40.5| 5.5 | 39.0 51 
Self-control (Sc) 98.1| 7.1 | 28.7| 7.4 | 30.6| 7.3 | 29.2 7-3 
Tolerance (To) 25.6| 3.8 | 25.8. 3.7 | 26.9] 3.0 | 26.2 af 
Good Impression (Gi) 17.4| 5.7 17:9: | Baz 18.7 | 5.9 17.3 E 
Communality (Cm) 25.7| 1.8 | 25.7| 1.8 | 25.8] 1.7 | 25.7] 17 
Achievement via 29.1 | 4.0 28.9 | 4.0 29.4 | 3.8 99.1| 4 
Conformance (Ac) 
Achievement via 23.3 | 3.2 23.2 | 3.2 23.7 | 3.0 23.2 3.0 
Independence (Ai) 4 
Intellectual Efficiency (Ie) | 43.8 | 3.5 | 43.9] 3.6 | 44.6] 3.2 | 43.6] £ E 
Psychological- 13.5| 2.4 | Bal 2.4 | 13.1] 2.5 | 12.9| 2 
mindedness (Py) 1 
Flexibility (Fx) 11.9] 3.9 | 11.8] 3.9 | 11.7] 3.8 | 11.8 P 
Femininity (Fe) 10.7| 3.9 | 16.8] 3.8 | 22.8 | 3.5 | 22.9| Sg 
Honor Point Ratio (HPR) | 3.1| .61| 2.97 .70| 3.18| .55| 3-14 - 


one half of the standard deviation obtained 
in the standardization of the SAT? 


CRITERION 


Freshman grades in college, or honor 
point ratio, were used as the criterion of 
scholastie achievement. The grading sys- 
tems of all the colleges in the study were 
converted to HPR by means of the stand- 
ard formula? used by the majority of insti- 
tutions. Generally, this formula was ap- 
plied to collegiate grading systems by using 


2 The SAT is standardized with a mean of 
500 and a standard deviation of 100. 

3 All grades were converted by an honor 
point ratio formula where A = 4, B = 3, 
cC=2,D=1, F = 0; grades were multiplied 
by credits per course and divided by total 
credits carried. 


: " Ja- 
the equivalences given in college regu P 


tions, or in letters from the colleges. 

few instances, numerical values had to 3 
assigned to letter grades on the basis 
the investigator’s best judgment. 


RESULTS sth 

College grades were intercorri elated be 
the CPI and SAT variables for StU dif- 
samples which were classified in tations 
ferent ways. First, zero-order correla and 
were computed for both the standar e Te 
cross-validation samples, and mu nF and: 
gression equations derived from the à yali- 
ard samples were applied to the eros ord 
dation samples. Second, both the staa pot 
and cross-validation samples WeT® jors, 
omized as science or nonscienc® ww 


and correlations were compute! 


THE PREDICTION OF COLLEGE GRADES 


137 


Tum G TABLE 2 
ORRELAT s 
OSE i Grapes (HPR) WITH THE SCHOLASTIC APTITUDE TEST AND THE 
NIA PSYCHOLOGICAL Inventory Across 291 COLLEGES 


Boys Girls 
Scale Cross- = 
Standard validation Standard Bu 
(vy = 476) | (48) | W =185) | QV = 179) 
SAT-Y, 
AT Math jam 04 P" .06 
Ominance Bu EN, Bb. .00 
" = z= 0 - 
pacity for Status uM - .01 “04 
ociability .19 all —.12 = 215% 
oci = 21% —4125* =, = 
Sa Pres ae | 35-30 
en: e —.15** —.12** —.15* 
esponsibif n = i. pn r^ E 
oci; es 5 .16** " ** z pkk 5 
e nation 2389 p p a 
Bie nura .15** “oat .22** 2 
ood > —.05 .05 .10 .02 
ae .02 ,19** .03 i 
Achievomon? yi .02 -.02 .08 .07 
Conto; YIN 07 19** 9g** 09 
Biever ane t 3 à A 
I ent via P 
Tu dependence 05 .05 .18* .05 
Psychologie Efficiency —.04 —.06 2 —.05 
Flexibility mindedness .01 .08 09 —.14 
Smininity —.10* —.14** —.08 —.22** 
z .96** .21** .19** .14 
Significant at .05 level. 


!Bnificant at -01 level, 


Suh, 
grags. Third, 


col i 
Bs and tos rrelations between 


i eight ¢ ' t variables were computed 
he & Ada leges in which 24 or more of 
S in the sample were enrolled. 


© resu] 
Tables ts of these analyses are shown in 


rm 
Btade Presents th ; 

S apa: e correlations for 

5 te her the CPI and SAT variables. 

Bresgi lciency of prediction, multiple 


Stang, ON equati 

pli dard m ations were derived from the 
ud : * and female samples and ap- 
sales, To Pi et en cross-validation 
ely Ple of test uce computational labor, a 
Bane” in the i was selected for in- 
Varig Se m equations. For the 
Pel, B » we selected the single 

latio E 


ay É 
the pn Dae highest significant cor- 
Our Classe, e SAT and from each of 
S of CPI scales suggested by 


Gough (1957, pp. 12-13). Multiple correla- 
tons and regression equations were ob- 
tained by following the DuBois (1957) pro- 
cedure. 

In z-score form, the resulting equation 
for boys is: zo = -16M + .11S0 — .198p + 
.17Fe. The multiple correlation for the 
standard male sample is .38, which is sig- 
nificant beyond the .01 level. Application 
of the equation to the male cross-validation 
group resulted in a correlation of .32 
(P « .01), or only slight shrinkage. 

The equation for girls is:z, = .25V — 
J4Sp + .06Re + 20Ac + .08Fe. The 
multiple correlation for the standard fe- 
male sample is .42, which is significant be- 
yond the .01 level. The application of the 
equation to the female cross-validation 
sample resulted in a correlation of .23, 


138 


which is significant beyond the .01 level, 
although the cross-validation E is about 
one-half the standard R. 

The zero-order correlations in Table 2 
are also of interest. The mathematics fac- 
tor of the SAT is statistically significant for 
both male samples (r = .17 and .15), but 
for only one of the female samples. The 
verbal factor is significant only for the male 
and female standard samples. 

For boys, the Capacity for Status (Cs), 
Sociability (Sy), Social Presence (Sp), Self- 
acceptance (Sa), Responsibility (Re), So- 
cialization (So), Self-control (Sc), Flexi- 
bility (Fx), and Femininity (Fe) scales of 
the CPI are statistically significant for both 


JOHN L. HOLLAND 


the standard and cross-validation samples. 
The Socialization scale has the highest pair 
of correlations, .22 and .29, for these sam- 
ples. For girls, only the Social Presence 
(Sp), and Socialization (So) scales are sig- 
nificant in both samples. The small size of 
the female samples may account for the 
limited number of significant correlations 
on both instruments. 

Table 3 summarizes the results of classi- 
fying students in the four standard and 
cross-validation samples as science or non- 
science majors and recomputing the corre- 
lations between grades and test variables. 
This classification was used to determine 
what effect a more homogeneous grouping 


TABLE 3 
Tue ConnELATION OF GRADES WITH THE SCHOLASTIC APTITUDE Test AND CALIFORNIA 
PSYCHOLOGICAL INVENTORY FOR SCIENCE AND NONSOIENCE SAMPLES 


Boys Girls 
T Standard Cross-validation Standard Cross-validation 
cale 
Science diee Sc fence Eus Ses oh. " Rides since 
345) 95) | 354) ie 83) n 95) 70) 
a — 
SAT-Verbal 12* 12 02 0 04 
. . E .04 Ei .92*« .12 |= 
ES za e a| gT a 
Capacity for Status — 19 o7 r4 [^ pon E: p ie o » 3 LM 
Hostal Preeti de a e =o Joe j= [7a 
Self-acceptance see 7-37") ~ .16**)—.25* |-.18 |—.07 |-.19 m. 
Sense of Well-being E m m a n Es ae w " di 
MARNE 13% | .201* | .17*^ .23* | :25*| 36 | .21 |--02,, 
Socialization .24 18 .31**| .29**| .26*| “oge | .9g**| -30 
Self-control .18**| .05 .24**| .95*| :20 E t .04 
Tolerance . —.01 |—.20 .04 .05 .02 12 16 |-.08 
Good Impression :03 |-.03 | .23**-.00 | .10 |-:06 | .15 |--07 
Communality | .02 |-.06 |-.02 |-.08 | .06 | .10 | .13 |--0l 
Achievement via .08 .02 .19**| .93* ** «| .0l 
Conformance ` ae 16 do 
on 
Aakn E .08 |—.16 03 m 15 .26* 15 |- ,03 
nde 
Intellectual Efficiency -01 |—.15 |—.09 .09 14 .04 40 |—--14 
Psychological-mindedness .06 |—.14 .13* |—.03 a7 ‘or -.0 |-- 19. 
Flexibility —.09 |-.13 |-.13* |—.16 |-.09 |—:03 |—.24* |-.25 
Femininity -25**| .32**| .21**| .09 | i16 | ‘299 | .10 | 28 


Note.—The Ns for the standard and prediction samples 
j i biguous. 
of major field is unkown or am! 
* Significant beyond .05 level. 
»* Significant beyond .01 level. 


have been reduced by eliminating students whose choi? 


} 


= 


d 


yp tai 


THE PREDICTION OF COLLEGE GRADES 


139 


TABLE 4 


THE E 
a = or Grapes (HPR) wits SCHOLASTIC APTITUDE TEST AND CALIFORNIA 
SYCHOLOGICAL INVENTORY VARIABLES WITHIN COLLEGES 


Boys Girls 
Scale 
C.LT. | Harvard} M.LT. Pacts Stanford| Yale Rad- Welles- 
w = 26) Y = sla = 25 | er sov = mla = 4 a Eola Sw 
SAT- 
a, .09 | .15 | .22 | .36*| .18 | .29*| .49* | .40* 
De 16 | .07 | .49**| .31*| .23 | .14 | .81 | .29 
y e eed 133 |-.16 | .02 | .10 |—.33 |-.04 |—.30 |—.01 
Gapueity for Status ‘19 |-.30*| .00 |—.03 |-.28 | .16 |-.19 |-.05 
Rosiat ility o |-.16 | .00 |-.06 |-.35 |-.12 |-.03 |-.04 
Serial Presence —122 |—.33**| .01 |—.03 |—.47* |-.22 |—.33 |—.05 
Sa acceptance —.07 |-.32**—.08 | .17 |—.16 |-.13 |—.25 | .29 
Sense of Well-being ‘92 | a7 | :23*| .02 | .12 | az | .00 | 34 
esponsibility “41* | .309*| .10 | .37*| .24 | .32*| .33 | .12 
v rialization 57**| 29**| .23*| .35*| .33 | .23 | .31 | .25 
oie nivel ‘45* | .31**| .29* | :23 | .44*| .32*| .22 | .328 
EN ue . .21 |-.05 19 |—.10 13 |-.07 | .16 |-.05 
Cond Impression Au*| log | .33**| .22 | .30 | .31*| .03 | .0l 
chi unality .10 |-.01 |—.03 |—.09 |-.14 .03 .04 | .15 
evement via “51*4| .23*| .35**| .21 | .32 | .18 | .23 | .24 
ienformance i i 1 j 
[d ^ 
d vement, via —.14 .08 17 .25 43* 18 37 |—-.16 
"e ependence 
iue jestual Efficiency ag |-.05 |-.03 | .24 | .21 |-.06 | .32 |-.04 
Flexi ical-mindedness | .05 | .18 | .21 |—.49** .04 | .20 :43* |—.15 
Pani iy —i32 |-.05 |-.09 | .00 |—.10 |—.07 | .01 |—.19 
inity 137 | .83**| .90 | .54**| .51**| .30* | .41* |-.39 
Note. 


Tespectively, 


San 
significant beyond .05 level. 
ignificant beyond .01 level. 


w 
Pn ei on the correlations between 
if aeg inci test variables and to discover 
f RENY achievement in science requires 
bis I of aptitudes and personality 
ies qe àn achievement in nonscience. 
of the go © average of the absolute values 
Which clog "imas in Table 3 is .13, 
f 19 tor Tey approximates the average r 
Sifieg i able 2, it is clear that this clas- 
tigre ced no change in the level 
Variables nships between grades and test 
the fresh Table 3 also suggests that, for 
Predict man year, 18 of the 20 variables 
Majors Brades for science and nonscience 
the Mather about equal efficiency. Only 
Sea], i aties factor for boys and the Ac 
Varig les SUN in unique, in that these 
Ades for de Significantly correlated with 
oth standard and cross-valida- 


Varia) 


C.LT. and M.L.T. stand for California Institute of Technology and Massachusetts Institute of Technology, 


tion samples, but neither is significantly re- 
lated to grades for the four nonscience sam- 
ples. 

The correlation of SAT and CPI vari- 
ables within individual colleges is shown in 
Table 4. This analysis was performed to 
evaluate the effect of a more reliable eri- 
terion (the grading system of a single col- 
lege rather than the equating of multiple 
systems) and to estimate the range of pre- 
dictive validities for single test variables 
from college to college. The 80 correlations 
for the eight colleges included in Table 4 
average .21. In contrast, the 80 correlations 
across colleges shown in Table 2 average 
12. Presumably this difference is due to 
differences in grading systems as well as in 
the aptitude level of student populations. 

The general pattern of correlations in Ta- 


140 


ble 4 resembles that found for the total 
sample; that is, high grades are negatively 
associated with the Cs, Sy, Sp, Sa, and Fx 
scales and positively associated with the 
Wb, Re, So, Se, Gi, Ac, Ai, Py, and Fe 
scales. In scale terms, these findings suggest 
that the high achiever lacks capacity for 
status, is unsociable, lacks poise and self- 
confidence, is self-deprecating and inflexi- 
ble, minimizes worries and complaints, is 
conscientious and responsible, is well con- 
trolled, and creates a favorable impression, 
does well academically under direction but 
is not as adept in situations demanding inde- 
pendent judgment, is interested in and re- 
sponsive to the feelings of others, and has 
feminine interests. In contrast, the low 
achiever is poised and socially skillful, has 
positive self-attitudes, is flexible, admits 
worries and complaints, has less intense 
superego qualities, is impulsive, creates a 
less favorable impression, possesses less mo- 
tivation for academic achievement, and has 
more extraceptive and masculine interests. 
Although individual colleges follow this 
general pattern, the eight colleges show a 
wide range of differences on a more limited 
number of scales. These findings suggest 
that achievement in the majority of col- 
leges results from a general cluster of per- 
sonality and aptitude variables, but that a 
given college may demand, in addition, a 
limited number of specific characteristics. 
For example, note that the correlations 
between SAT-V and grades range from 
.09 to .49; similarly, the correlations for 
SAT-M range from .07 to .49. In addition, 
the relative predictive value of V and M 
vary markedly within a given college. The 
Dominance scale ranges from .30 to —.33; 
the Capacity for Status scale ranges from 
19 to —.30; the Social Presence scale 
ranges from .01 to — 47; the Self-accept- 
ance scale ranges from .29 to —.32; the 
Good Impression scale ranges from .01 to 
A1; the Achievement via Independence 
scale ranges from —.14 to .43; the Psycho- 
logical-mindedness scale ranges from .43 
to —.49; and the Femininity scale ranges 


from .54 to —.39. 


JOHN L. HOLLAND 


The patterning of predictors within and 
between colleges may also be due to the 
variation in institutional environments. 
Thistlethwaite (in press) found college 
press to be related to student achievement. 
Consequently, the interaction of student 
motivational and personality characteris- 
ties and college presses may in part produce 
the correlational differences among colleges 
shown in Table 4. This interpretation ap- 
pears plausible since the majority of stand- 
ard deviations for the individual colleges 
listed in Table 4 are smaller than standard 
deviations for the total samples shown 1n 
Table 2. Accordingly, the greater average 
size of the Table 4 correlations compare 
to the Table 2 correlations is probably due 
to the interaction of students and college 
environments as well as the use of a more 
reliable criterion rather than to differences 
in student homogeneity on the test vari- 
ables. 

Taken together, the results suggest that 
for some individual colleges, specific for- 
mulae which would be substantially effec" 
tive in predicting grades could be devel- 
oped. This assumption appears especially 
plausible since the multiple Rs attaine 
here were accomplished across a large num 
ber of colleges with a sample having nar- 
row range of talent. Further, the soto 
order correlations for individual colleges 
shown in Table 4 are substantially higher 
than those obtained across colleges 
Table 2. 


Discussion 


For the SAT, the Verbal and Math ar 
tors appear to be about equally pur 
in predicting grades, For the CPI, the A 
So, Re, Ac, and Fe scales have useful PT 3 
dictive validity, both alone and in ns 
bination with the SAT. The Sp and "^ 
scales appear more efficient, since they 25 
significantly related to grades for all fo z 
total samples across colleges. The sign 
cant correlation of the CPI So scale Y? 
grades in all four major samples supo". 
Gough's (1955) theory that achieveme 


e mu coe 3 un 


THE PREDICTION OF COLLEGE GRADES 141 


and underachievement among gifted per- 
Sons is a specific facet of the general prob- 
lem of socialization. 

These results generally support Gough’s 
(1955, 1957) findings, but they fail to con- 
firm the validity of the Ai and Ie scales. 
However, the high aptitude levels and out- 
Standing high school achievement records 
of this sample may explain these unex- 
Pected variations, particularly the relative 
failure of the Ai and Ie scales and the un- 
leeds predictive efficiency of the Social 
rig Capacity for Status, and Flexi- 

ility scales, When the mean scale scores 
E re four across-college samples are pro- 
&G, using high school norms, the two 
NA Scores for both male and female 
Wa Occur on the Ai and Ie scales, 
Nee are elevated approximately 2 stand- 
: Plea above the mean. Since the 
ged and Certificate winners have un- 
high school records (the average 
bos: about 99 for Scholars and 95 for 
stu e h it is significant that these 
on S should peak on these scales. The 
Gan cia of Scores on these scales may 
other EP their effectiveness, so that 
this I scales are more predietive in 
Study (see Table 1). 
the en the CPI and SAT are compared, 
OWever Seems generally more efficient; 
One boc? the comparison is an ambiguous 
testing "y of the differences in time of 
Ministratic the circumstances of the ad- 
Over, the ^ of these instruments. More- 
used ag cholarship Qualifying Test was 
Pool o ES lay device to select a final 
Petia students from the 166,000 
S In the National Merit pro- 
the a5? basis for awarding scholarships, 
Was used to retest this final pool 
: ts. Since the SAT and SQT are 
Selection Content and construction, the 
Probably of the final pool by the SQT 
taineg for attenuates the correlations ob- 
as a Sere, "us SAT. The CPI was not used 
kno eae device, a fact which was 
Stems eg © students in the sample. It 
CY that selection on the basis of 


of st 
PA den 


SQT scores attenuated correlations for the 
CPI less than correlations for the SAT. 

Probably more important than the ques- 
tion of the relative efficiency of these tests 
is the finding that at a high level of scho- 
lastic aptitude, personality variables may 
yield validity coefficients which are two to 
almost three times as great as those ob- 
tained using aptitude measures alone. 
Whether this would be true if both instru- 
ments were used as selective devices is, of 
course, not known. The present evidence 
contrasts sharply to that of most academic 
prediction studies, which report that the 
addition of a personality variable usually 
adds only a relatively small increment to 
the aptitude-grades correlation if it in- 
creases the correlation at all. Using a com- 
bination of high school rank and SAT to 
predict grades is also relatively inefficient 
for samples of the present aptitude level. 
In a similar study of National Merit Schol- 
ars in 1956 (Holland, 1958), it was found 
that SAT-V, SAT-M, and HSR produced 
a multiple R, non-cross-validated, of only 
.16 for a total male sample of 394 science 
majors only; this result is only one-half the 
cross-validated R of .32 for the 1957 male 
sample. 


SUMMARY 


The CPI and SAT are useful in pre- 
dicting college freshman grades for a sample 
of high aptitude high school seniors. Mul- 
tiple regression equations for the CPI and 
SAT in combination cross-validate, and the 
resulting multiple Rs are two to three times 
as great as the zero-order r's obtained for 
the SAT alone. Although CPI variables 
are generally more effective than the SAT 
variables in the present study, the individ- 
ual scales in these instruments show wide 
variation in validity from college to college. 


REFERENCES 


DuBors, P. H. Multivariate correlational 
analysis. New York: Harper, 1957. . 

Govan, H. G. Factors related to differential 
achievement among gifted persons. Pa- 
per read at APA, San Francisco, 1955. 


142 JOHN L. HOLLAND 


Goucu, H. G. Manual for the California NATIONAL MERIT SCHOLARSHIP CORPORA- 


Psychological Inventory. Palo Alto: Con- TION. Second annual report. Evanston: 
sulting Psychologists Press, 1957. Author, 1957. 

Hortan, J. L. Prediction of scholastic suc- THISTLETHWAITE, D. L. The development of 
cess for a high aptitude sample. Sch. talent by American colleges. Science, in 
Soc., 1958, 86, 290-293. press. 

HorraNp, J. L. Determinants of college 
choice. Coll. & Univer., in press. Received January 26, 1959. 


S 


JOURNAL or EDUCATIONAL PsycHoLoGY 
Vol. 50, No. 4, 1959 


THE RELATIONSHIP OF INTELLIGENCE AND ACHIEVEMENT 
TO BIRTH ORDER, SEX OF SIBLING, 
AND AGE INTERVAL! 
SARAH M. SCHOONOVER 
Northwest Guidance Center, Lima, Ohio 


This investigation is a supplement to the 
Writer's previous study of resemblances in 
m mental and educational ages of siblings 
Nisus, 1956). In the original study, 
te different methods of analysis of longi- 
p growth records, i.e., means of the 
ec" differences, percentage reduction of 
Enea by family membership, and cor- 
ini a produced evidence of a substantial 
Eum sg sibling resemblance in intelli- 
ma and achievement. Resemblances in 
they E were somewhat greater than 

Ti ere in achievement. 

ORT cm of this study is to investi- 
és A à ollowing questions: (a) What is 
Nes ionship of the ordinal position of 
zs to his mental test performance? 
child’, oh eed relationship of the sex of a 
ance? © Te to his mental test perform- 
age inten, = is the relationship between 

Ente val and degree of resemblance in 

Achievement of siblings? 


in 

ing the ability have been numerous dur- 
the NR 90 years. The contribution of 
Ploy the iNeed is that they em- 
Problem ; Ongitudinal approach to this 
Utilized y n à manner that has not been 
hi Ces ath data analyzing sibling resem- 

?n intelligence or achievement. 


Sin 
LINGS SELECTED FOR STUDY 


e 

Ff the data Were obtained from the records 
the ive ty Elementary School at 
ling m aty of Michigan. All true sib- 
anq with R with chronological age overlap, 
T ur or more scores per sib on the 

any iter is indebted to Willard C. 
9 Byron O, Hughes for their 


Suggestions in the prepara- 
manuscript, Mid 


Udies į : Em 
es involving familial resemblances! 


143 


Stanford-Binet test and on the Stanford 
Achievement test, from the fall of 1929 
through the spring of 1951, were used in 
this study. With these qualifications, 59 
sibling pairs were found for intelligence; 
64 pairs for arithmetic, education, reading, 
and spelling; 42 pairs for literature and 
social studies; 40 pairs for language; and 
38 pairs for science. For additional details 
regarding description and selection of data 
the reader is referred to the earlier report 
(Schoonover, 1956). 


METHODS FOR THE COMPARISON OF 
LONGITUDINAL SIBLING RECORDS 


For each family included in this study a 
mental growth graph and eight achieve- 
ment growth graphs were constructed. 
This meant that 344 growth graphs and 
757 individual growth curves were plotted. 

The linear equation best fitting the data 
was found to eliminate the observed vari- 
ation and to determine some constant rate 
of growth which may be used to character- 
ize the observed results. The equation of a 
straight line, y = ax + b, which gives the 
slope and the intercept of the line used to 
describe the growth-age relationship, was 
found by the method of the least squares 
fit. The linear fit for each child for intelli- 
gence and for each of the eight achievement 
variables was plotted graphically. 

For each pair of siblings the limits of the 
overlap of their chronological ages were 
found, and from these the midpoint of the 
age overlap was computed. From the linear 
best fit, the age scores of each sibling were 
read at these midpoints. By subtracting 
the score equal to the midpoint of the age 
overlap from each sibling's chronological 
age score, the average difference for the 


144 


overlap period was found. In other words, 
each child's mental and/or achievement 
age score difference from the theoretical 
norm was obtained. The means of these 
differences were calculated for the older 
siblings and for the younger siblings. 
Using the method described above, the 
means of the differences between mental 
and/or achievement age scores and mid- 
points of the chronological age overlap 
were calculated for sibs with brothers and 
for sibs with sisters. To ascertain the re- 
liability of the differences between the 
sib-pair means, t tests were employed. 
The chronological age difference in 
months and the midpoint of the age over- 
Jap was found for each sibling pair. From 
the linear best fit, the age scores of each 
sibling were read at these midpoints. The 
average difference for each sibling pair was 
found by subtracting the smaller age score 
from the larger age score. Correlation co- 
efficients were computed to discover the 


relationship between the chronological age © 


difference and the average score difference 
for the sibling pairs. A Pearson product- 
moment formula was used, as was a ¢ test 
of an observed correlation. 


FINDINGS 


Older and younger siblings. The results 
secured by the comparison of the means of 
the average differences from chronological 


TABLE 1 
MEANS OF ÅVERAGE DIFFERENCES FROM 
CHRONOLOGICAL AGE NORMS FOR OLDER 
AND YOUNGER SIBLINGS, EXPRESSED 


IN MoxTHS 
M Ear Achieves | Older Sib | Younger Sib 

Mental 24.8 25.7 
Arithmetic 1.0 1.4 
Education 9.4 8.0 
Language 35.8 35.7 
Literature 23.9 21.7 
Ren ails 18.1 14.2 
Science 25.1 24.2 
Social studies 13.4 v 
Spelling -id e 


SARAH M. SCHOONOVER 


TABLE 2 
MEANS or AvERAGE DIFFERENCES FROM 
CHRONOLOGICAL AGE Norms FOR SIBLINGS 
Wirn BnmorHERs AND SIBLINGS WITH 
Sisters, EXPRESSED IN MONTHS 


Mental or Achieve- Sibs with Sibs with 

ment Age Measure Brothers Sisters 
Mental 26.2 23.9 
Arithmetic 3.0 —0.9 
Education 10.3 6.5 
Language 44.3 25.8 
Literature 27.6 17.1 
Reading 17.9 13.4 
Science 32.7 17.1 
Social studies 19.5 6.7 
Spelling 1.2 —4.1 


age norms for older sibs and for younger 
sibs are given in Table 1 for intelligence 
and achievement. 

These older and younger siblings COD" 
sistently were found to have means of aver- 
age differences that were very similar to 
each other. No differences were found to 
be reliable at the .05 level. 

These findings are in accord with those 
of Hsiao (1931), Jones and Hsiao (1928); 
and Griffitts (1926). They are incongruent 
with those of Thurstone and Jenkins 
(1929), Willis (1924), and Koch (1954)- 


' Jones (1954) has pointed out that in nor- 


mal samples, when methodological difficul- 


ties are adequately controlled, no bir 
order differences in intelligence occur. bes 
findings of this study not only support s 
viewpoint but, in addition, reveal no bir 
order differences in achievement. 

Siblings with brothers and sibi * 
sisters. The results obtained by the c 
parison of the means of the average Ld 
ences from chronological age norms for 9' i 
with brothers and for sibs with sisters ar 
shown in Table 2. 

Siblings with brothers consistently V 
found to have larger means of averse 
differences than siblings with sisters. he 
other words, sibs with brothers had hig ‘A 
scores than sibs with sisters in all men d 
and achievement measures. The differen” " 
between these sib means were S13? v 


lings with 


were 


SIBLING INTELLIGENCE AND ACHIEVEMENT 


at the 01 level or lower in language, liter- 
iu gne and Social studies, and at 
rods level in arithmetic. The sib mean 
Fon enees in intelligence, reading, educa- 
x ‘ce were not significant at 
oe results substantiate those found 
With 2 (1954) and are in disagreement 
(1954) o Secured by Tabah and Sutter 
thas "s s an explanation of the finding 
infe; € possession of a male sib appears to 
icai test performance, Koch has 
am out that possibly the more aggres- 
his K En and competitive male alerts 
ts O & greater extent than does the 
passive female. 
fo 506 interval and average score difference 
method of The results secured by the 
Sorio o; correlation coefficients for chron- 
KOSA E interval and average score 
Table a ke sibling pairs are given in 
Messi © relationship was found be- 
tics uel and average score differ- 
train i nem with correlations ranging 
and Shien to +0.18 for the nine mental 
ement measures. 
tions pes "à Observed that these correla- 
Some idc antiate each other and give 
lions in alga of the value of the correla- 
na relativ: Population, since they are all 
ka "ied narrow range. 

Conrad ee are harmonious with those 
ardson Coat Finch (1933), and Rich- 
results, it ). On the basis of the above 
intervals Appears that variations in birth 
of intelligere Without influence on measures 

E s and achievement. 
May be 8sestion for further research it 
Of e AM out that if a large number 
a see aPEitudinal sibling records 
B, ie ed, the interactions of the 
te ^» Ordinal position, sibling's 
“ing could be investigated. 


omp 
Variab] 
X, and 


Summary 


N analysi 
E ysis was made of sibling per- 


On intel; 
aa Intelligence and achievement 


145 


TABLE 3 
CORRELATIONS BETWEEN BIRTH INTERVAL 
AND DIFFERENCES BETWEEN SIBLINGS 
IN MENTAL AND ACHIEVEMENT AGES 


Mental or fchievement — | Corelation Coefficient 
Mental 0.03 
Arithmetic 0.18 
Education 0.00 
Language 0.11 
Literature 0.01 
Reading —0.09 
Science —0.07 
Social studies 0.03 
Spelling —0.07 


older and younger siblings in intelligence 
or achievement as measured by deviation 
from the norms for chronological age. Thus 
priority of birth in a family gave no ad- 
vantage in intelligence or achievement. (b) 
Sibs, irrespective of sex, with brothers con- 
sistently had higher mental and achieve- 
ment ages than sibs with sisters. (c) The 
relationship between interval between 
births and the average difference in intelli- 
gence and achievement for sibling pairs 


was insignificant. 
REFERENCES 

Conran, H. S. Sibling resemblance and the 
inheritance of intelligence. Unpublished 
doctoral dissertation, Univer. of Cali- 
fornia, 1931. 

Fincu, F. H. A study of the relation of age 
interval to degree of resemblance of sib- 
lings in intelligence. J. genet. Psychol., 
1933, 43, 389-404. 

, GRIFFITTS, C. H. The influence of family 
on school marks. Sch. Soc., 1926, 24, 
713-716. 

7 HsrAo, H. H. The status of the first born 
with special reference to intelligence. 
Genet. psychol. Monogr., 1931, 9, 1-118. 

- Jones, H. E., & Hsrao, H. H. A preliminary 
study of intelligence as a function of 
birth order. J. genet. Psychol., 1928, 35, 
428-433. ; 

z JONES, H. E. The environment and mental 
development. In L. Carmichael (Ed.), 
Manual of child psychology. New York: 
Wiley, 1954. 


Sign: azin : 
Enificang g longitudinal data. (a) No- Kocu, Heren L. The relation of ‘primary 


erences were found between 


mental abilities’ in five- and six-year- 


146 SARAH M. SCHOONOVER 


olds to sex of child and characteristics Tasam, L., & Surrer, J. Le niveau intel- 


of his sibling. Child Develpm., 1954, 25, lectuel des enfants d'une meme famille. 

209-223. Ann. Eugen. Lond., 1954, 19, 120-150. 
RicwaRpson, S. K. The correlation of intel- Tuursronz, L. L., & Jenxins, R. L. Birth 

ligence quotients of siblings of the same order and intelligence. J. educ. Psychol., 

chronological age levels. J. juv. Res., 1929, 20, 641-651. 

1936, 20, 186-198. 7WiLurs, C. B. The effects of primogeniture 
ScmoowovER, Saran M. A longitudinal on intellectual capacity. J. abnorm. soc. 

study of sibling resemblances in intelli- Psychol., 1924, 18, 375-377. 

gence and achievement. J. educ. Psy- 

chol., 1956, 47, 436—442. Received July 1, 1958. 

- 
s 


E" 


JOURNAL or EDUCATIONAL PsycnoLoaY 
Vol. 50, No. 4, 1959 


CREATIVITY AND THE SELF-ATTITUDES AND 
SOCIABILITY OF HIGH SCHOOL STUDENTS! 


LEANNE GREEN RIVLIN 
Brooklyn College 


Current concern regarding the adequacy 
Our scientific and technical manpower 
E ow considerable interest in the 
iu of the intellectually gifted. The press- 
an to identify and encourage those 
"iot s who are capable of high-level per- 
cria has directed attention to the 
ee for recognizing the capable in- 

Dossible. s early in his school career as 
T a bya of creativity of the intellectually 
em. Cre vw approach to the larger prob- 
originalit, lvity as a focus emphasizes the 
research y aspect of giftedness. The limited 
study fr on creativity has approached the 
ess, the om three views: the creative proc- 
RE A product, and the creative 
e result 2 The creative act appears to be 

a number : a complex interaction, in which 
met, The 9" necessary conditions have been 
Se conditions include: , 


of 


Wa . = 
are minimum of intelligence (one 
age) 18 considerably higher than aver- 
v2. Skill in a 
ent,” e 
ity”) 
* Traini 
Xi whee and/or experience in the area 
A dne Muse is skill 
ortunit, ili i 
5 12, and y to utilize or actualize 
The i 
facilitate onality characteristics that 
4 € the functioning of 1 through 


Particular area (called **tal- 
technique," or ‘technical abil- 


The 

fa ; 

anq a ctor-analytic studies of Guilford 
a SsOciates (Wilson, Guilford, & 

ti ls $ 

x Submither 1s adapted from a disserta- 

e ; 1n partial fulfillment of the 

^id the degree of doctor of 

pde ioin Committee on 

Je: a 40n, Columbia Univer- 
sila, tie wishes to thank Arthur T. 

age NU. my, Irving Lorge, and 

“tance, mis for their encouragement and 


Christensen, 1953) might be regarded as 
an investigation of the process and product 
aspect of creativity. Gough (1955), Stein 
and Meer (Stein, 1953; Stein & Meer, 
1954), Barron (1953), and Drevdahl (1956) 
have all examined personality factors rele- 
vant to ereative effort. These studies in- 
dicate an independence or self-assertive 
factor as characteristic of creative individ- 
uals. However, the act of originality is 
popularly associated with extreme social 
isolation, intense and private self-search- 
ing: the picture of the artist shut up in his 
garret, the withdrawn writer, the scientist 
wrapped up in his world of formulae and 
test tubes. Little objective evidence for the 
reality of the traits of isolated social mal- 
adjustment has been offered other than 
occasional life histories and biographies of 
some creative individuals. 

The major question investigated in this 
study is whether high school students se- 
lected as creative and equally able high 
school students selected as noncreative 
differ with regard to (a) self-attitudes, (b) 
sociability. The relation to creativity of 
intelligence, socioeconomic status, parental 
occupation, and parental education was 
also studied. 


METHOD 


The initial task involved the develop- 
ment of a technique for the identification 
of the creative and noncreative students. 
In view of the variability of the expression 
of creativity, it seemed advisable to employ 
a procedure based upon extended observa- 
tion of the students involved. Nomination 
by teachers was adopted as the procedure 
most likely to take this variability into 
account. The criteria of creativity used by 
the teachers when making the nominations 
were developed in a preliminary study. A 


147 


148 


critical incident (Flanagan, 1954) form 
was completed by 225 teachers from high 
schools throughout New York City, and 
the analysis revealed 14 criteria of creativ- 
ity. In summary, these were: 


. Gives work a “personal touch" 
. Will venture into unfamiliar or new 
areas 
. Is sensitive to the potentialities of 
media 
. Works with enthusiasm and pleasure 
. Demonstrates judgment on the basis 
of personal standards, a sense of ap- 
propriateness, and taste 
. Is able to “let himself go” and freely 
respond to the source of stimulation 
. Displays a capacity for self-direction 
and independence 
. Simplifies a complex task by perceiv- 
ing and emphasizing the essentials 
. Uses past knowledge to interpret a 
present problem in a manner that is 
original and meaningful 
10. Displays judgment and foresight in 
planning work 
11. Questions and tests the implications 
of facts 
12. Understands when to give up plans 
or ideas that seem impractical or in- 
adequate 
13. Flexible in approach to problems 
14. Demonstrates imaginative and origi- 
nal solutions to problems 


qa © NH 


o6 0 xa o 


The nominating teacher attended a train- 
ing conference during which the criteria 
were discussed, and each was then directed 
to select five intelligent, creative students 
and five equally able noncreative students 
in his classes on the basis of the criteria, 
If he could not nominate five that would 


TABLE 1 


DISTRIBUTION OF lÜTH- AND 1lTH-GrapE 
STUDENTS NOMINATED AS CREATIVE AND 


NOoNCREATIVE 
Group 10th Grade 11th Grade 
Creative boys 15 14 
Creative girls "i 16 
Noncreative boys 24 14 
Noncreative girls 10 16 
(Total N = 126) 


o 


LEANNE GREEN RIVLIN 


appropriately fit into each category, he 
was permitted to select fewer. 

The students selected were given a bat- 
tery of tests. These included a modification 
of Bills’ Index of Adjustment and Values 
(Bills, Vance, & McLean, 1951), and the 
Social Adjustment subtest of the Bell Ad- 
justment Inventory. Two of Bills’ three 
ratings were used in the present study. 
These were “concept of self" and “con- 
cept of ideal-self.” The "acceptance of 
self” category was omitted, since previous 
testing revealed some difficulties on the 
part of the students in using this category. 

The test form contained questions re 
garding parental education and occupation. 
Four sociometrie questions were also m- 
cluded, in which the nominees were asked 
to select the students in their grade whom 
they would: (a) like to invite to a party, 
(b) like to belong to their school club, (c) 
like to work with on a class committee, ( 
consider most creative. Pintner IQs and 
scores on the Iowa Tests of Educational 
Development were obtained from schoo 
records, 


Subjects 


The Ss were 126 10th- and 11th-grade 
New York City high school students — 
inated by 25 honor class teachers, 9T 
teachers, or music teachers as either ein 
ative or nonereative. These special Wen 
were selected in an effort to obtain SS s 
were relatively homogeneous in each a 
terest area. The classes represented ^ 
academic subject areas. 


Treatment of Results 


The data were subjected to an analys? 
of variance, The 126 Ss were separated tive 
subgroups according to creativity (creat? or 
or noncreative), sex, and grade que 
11th grade). This resulted in eight uned 
cell subgroups, summarized in Table 2 
which required an analysis of variance jo" 
cell means. Since responses to the ear 
metric questions were not available fo! 


, 


===. 


CREATIVITY, SELF-A TTITUDES AND SOCIABILITY 


entire 10th and llth grades, one 10th- 
grade class was studied for a rough index 
of student evaluation of peer social attrac- 
tiveness. This class, designated “4-H” had 
the highest number of nominees, with 17 
Students represented. 

In all comparisons, the .05 level of con- 

dence was adopted. 


Resutrs 


The basic results, except for the socio- 
Metric data, are indicated in Table 2, con- 


149 


taining means and standard deviations for 
the creative and noncreative students, with 
the sexes separate but the grades com- 
bined. Table 3 contains the results of the 
analysis of variance based on the cells 
described in Table 1. 


Self- Attitudes 


There were no significant differences in 
self-attitudes when the evaluations of cre- 
ative and noncreative students were com- 
pared. On the “ideal-self” section the boys’ 


C TABLE 2 
OMPARISON OF THE Creative AND Noncreative Boys AND GIRLS (Grapes COMBINED) 


* 
AS EvipENcED IN THE MEANS AND STANDA 


RD DEVIATIONS OF THE SEVERAL MEASURES? 


Boys Girls 
Creative Noncreative Creative Noncreative 
M | sD| M |sD| M | sv] M | sp 
a oy w | 5| "m 
Tod Concept 247.86 | 16.48 | 247.47 | 19.57 | 249.61 | 15.89 | 244.92 | 15.28 
Seir. [4o 279.80 | 11.00 | 278.08 | 10.79 | 284.79 | 6.96 | 280.69 | 11.95 
oeil discrepancy | 38.90 | 16.59 | 37.42 | 18:40 | 41.27 | 17°61 44.46 | 14.34 
d (Bell sub- | 7.06] 4:87] 921| 5.32] 5.30| 3.11] 8.85 | 4.45 
lowa 
; 24.17 | 4.28 | 20.21 | 4.06 | 20.51 | 5.06| 19.50| 3.28 
8 Pintner) 140.21 | 15.45 | 127.97 | 14.19 | 128.03 | 16.24 | 127.54 | 13.82 
* Ns nro indicated in Table 1. i 
TABLE 3 


NALy, 
SIS OF VARIANCE OF THE SOCIABILITY 


Scores, Iowa Scores, AND IQs or STUDENTS 


NOMINATED AS CREATIVE AND NONCREATIVE 


Sociability Iowa IQ 
Sources of Variation 
df Mean F Mean F Mean F 
Squares Squares Squares 
Sox 
ore 1| 60.47 | 3.26 [155.00 | 7.70** | 1134.02 | 4.75* 
Featiyi lij — 26.12 | 1.27 j — 
Sex X Y (Cr.) 1 | aras | tion 168.75 | 8.20** | 1139.06 | 4.78* 
gt X qu 1| 17./8| — | 53.90] 2.62 | 1098.64 | 4.61 
Re X Grade 1| 18.24] — | 19.98] — | 38.44] — 
WX X Grade 1| 18.97] 1.02 w| — 36] — 
Within one X 1 Fal um 16.66 | — gë] — 
otal. 118 | 18.55 20.59 238.52 
a a EE BT Ir ERN 125 


leni, 
^. Sig; ant At the .05 level, 
cant at the -01 level, 


150 


TABLE 4 
COMPARISON OF THE CREATIVE AND 
NowcnEATIVE STUDENTS (GRADES 
CoMBINED) AS EvIDENCED IN THE 
MEANS OF THE SOCIOMETRIC 
MEASURES* 


Creative | Noncreative 


Sociometric 
Qussticn M|sp|MJ|sp 


(1) invite to 1.60 | 1.58 | .43 | -53 
party 

(2) belong to 1.60 | 1.43 | .28 | .50 
club 

(3) class com- | 2.20 | 1.81 | .28 | .50 
mittee 

(4) ereative 2.00 | 1.88 | .14 | .38 


? Total N was 17, 10 creative Ss and 7 noncrea- 
tive Ss, all tenth graders with sexes combined. 


scores were significantly lower than those 
of the girls, representing a lower ideal. 

An analysis of 17 of the adjectives was 
then undertaken to determine the possibil- 
ity of internal differences. The adjectives 
were selected because they embodied qual- 
ities that Gough (1955) and Barron (1953) 
had indicated as discriminating individuals 
possessing traits relevant to creativity. 
This analysis revealed four adjectives which 
diseriminated at the .05 level: creative, 
imaginative, popular, and shy. The creative 
Ss had more positive self-evaluations on 
these traits. 


Sociability 


On the Bell subtest, the creative students 
had significantly lower sociability scores 
(representing higher social confidence) than 
the noncreative students. Table 3 indicates 
this difference, significant at the .01 level. 
There was no significant difference between 
the two grades studied, nor was there a 
significant sex difference. ' 

The number of votes received by each 
student on each question yielded the socio- 
metric score. An analysis of the sociometric 
choices of the 17 creative and noncreative 
nominees from Class “4-H” revealed that 
the creative students were more popular, 


LEANNE GREEN RIVLIN 


and also were selected more often as pos- 
sessing creativity. These differences, which 
are evident in Table 4, were significant at 
the .05 level. The votes for the creative 


students came from both their creative and . 


noncreative classmates. 


Intellective Measures 


Both the Iowa tests and the Pintner 
IQs revealed significant differences between 
the sexes and between the creative and 
noncreative students. The boys’ lowa 
scores were superior to those of the girls, 
with the difference significant at the .0l 
level, as indicated in Table 3. A similar get 
of differences appeared on the IQs, signifi- 
cant at the .05 level. The creativity differ- 
ence on both the Iowa scores and 1Qs 
came from the boys, for the creative girls 
did not score very much higher than the 
noncreative girls. We might anticipate 20 
interaction factor to emerge in the analysis 
of variance. It appeared on the IQs, al- 
though not on the Iowa scores, possibly 
due to the very slight tendency for the 
creative girls’ Iowa scores to be higher than 
those of the nonereative girls. When the 
intellective measures of the nominate 
students were compared with the scores o 
all students in their respective grades, the 
scores of nominees, both creative and non 
ereative, were consistently higher pru 
their grade averages. In the case OF es 
IQs, the mean score of the nominees A 
each grade was 130. This was 24 POM ig 
higher than the 10th-grade mean, and 
points higher than that of the 11th grade. 


Background Factors 


The parental occupation Jevel of je 
creative students did not differ SE» 
cantly from that of the nonereative enm 
dents. However, the creative dude 
parents reached a significantly higheT A 
of education. The difference was signifie? as 
at the .05 level. Parental educatio? Met 
represented by four scale points: 1 dedi 
elementary school was the last atten”), 
2 for high school; 3 for college; 2? 


i 


CREATIVITY, SELF-ATTITUDES AND SOCIABILITY 151 


Postgraduate school. The average score for 
all creative students’ parents was 2.6, and 


that of the noncreative group’s parents 
was 2.07. 


Discusston 


When the self-attitudes and sociability of 
Students nominated as either creative or 
noncreative were studied, some of the com- 
i Stereotypes reputed to characterize 

creative personality were not sub- 
ord, Recognizing procedural limita- 
x ns (eg. possible teacher bias in selection 
c Spite of training; the limitations of 
Riesen penei tests, ete.) some group 
[reme appear, Students nominated as 
Self-attit a not rate themselves lower on 
Sipht t ui es than noncreative students as 
riya o e anticipated, in line with the com- 
iur MBA of the creative individual. 
icit the self-attitude test contained 
Gee ate traits, and the total score may 
ER ee differences. This is sug- 
a AE the item analysis of 17 of the 

; lento! Some of which were found to 
m e a ein the direction of more posi- 
creative "evaluations on the part of the 
Vita] to students, Some traits may be more 
traits ioe than others. Perhaps the 
aS the a may not be as significant 
evaluation lvidual's reaction to his self- 

Zonen. Almost all students, creative 

e [dai revealed some discrepancy 
the WE © way they saw themselves and 
tinggi” they would like to be. The dis- 
Creative etween the creative and non- 
itself Person may rest not in the score 
Sel een in the student's response to the 
Congty ate. The discrepancy may act to 
: Tet or ihi È s 
ity ni inhibit the creative productiv- 
fiho, jy: nonereative student, while it 
Dede a Mulates or at ], i 

Eu s at least does not im- 

1 ROW See ative individual's push ahead. 
orale ions i essential to consider self- 
eat ing Bre E: more sensitive manner, 
er-al] Score cific traits rather than an 

e 

larity Seater social confidence and popu- 
© student nominated as creative 


also does not support the common concep- 
tion of the antisocial creator. However, 
interest in and capacity for social inter- 
action do not preclude moments of intense 
isolation and concentration so necessary for 
creative work. Rather, the creative student 
appears to be capable of both concentration 
and socialization. It must be acknowledged 
that the teacher nomination itself may be 
a form of sociability index. Thus, we may 
be studying a group of sociable creators. 
Whether another group of quiet, seclusive 
creators also exists we cannot say on the 
basis of this study, although it suggests a 
direction for future research. With regard 
to the pronounced sociability differences, a 
number of possible explanations exist: 

1. Lack of social confidence may inhibit 
the realization of a creative potential. 

2. Lack of social confidence may prevent 
a functioning creativity from being noticed 
and shared by others. 

3. The social confidence of the creative 
students may be a superficial skill, and not 
identical with sympathy and sensitivity. 

4. The manifestation of creative en- 
deavors itself may increase the individual’s 
social confidence; the lack of special talents 
may diminish the potential, in one im- 
portant area, for gaining social confidence. 

The significant intellective differences 
between creative and noncreative boys 
(with negligible differences in the case of 
girls) and the greater number of boys 
placed in the noncreative group (particu- 
larly in the 10th grade) suggests either the 
existence of a nominator bias in the selec- 
tion of the groups, or perhaps a creativity 
difference between the sexes. Further work 
with a more rigorous nomination procedure 
seems necessary. However, when we ex- 
amine the discrepancy between the intelli- 
gence of creative and nonereative boys, it 
would seem improbable that this difference 
is the major factor determining creativity. 

What some of the other factors are can 
only be suggested by this research. The 
background data raise the possibility that 
a family of somewhat higher educational 


152 


level may play a role in stimulating cre- 
ativity. Further consideration of the intel- 
lectual climate of the home is indicated. 


SUMMARY AND CONCLUSIONS 


This study has attempted to examine the 
validity of certain assumptions regarding 
the creative personality in a high school 
population. One hundred and twenty-six 
Ss were nominated by their teachers as 
either creative or noncreative, and the two 
groups were compared with regard to self- 
attitudes, sociability, and certain back- 
ground information. The results indicated 
that the student selected as creative 
emerged as a rather sociable individual 
evaluating himself as more confident in his 
relationships with people, more popular 
and creative as viewed by his peers than 
his noncreative counterpart. The creative 
student did not differ in over-all self-atti- 
tudes from the noncreative. It was sug- 
gested that a number of trait combinations 
might be equally conducive to creativity. 
The study does indicate that in the case 
of high school students two factors asso- 
ciated with creativity appear to be social 
confidence and parents who have attained 


LEANNE GREEN RIVLIN 


a somewhat higher educational level. A 
number of directions for future research 
were suggested. 


REFERENCES 


Barron, F. Some personality correlates of 
independence of judgment. J. Pers. 
1953, 21, 287-297. 

Binis, R. E., Vance, E. L., & MCLEAN, 
O. S. An index of adjustment and val- 
ues. J: consult. Psychol., 1951, 15, 257- 
261. 


DnzEvpanr, J. E. Factors of importance for | 


creativity. J. clin. Psychol., 1956, 12, 
21-26. 
Franacan, J. C. The critical incident tech- 
nique. Psychol. Bull., 1954, 51, 327-358. 
Goucu, H. G. Reference handbook for the 
Gough Adjective Check List. Berkeley? 
Univer. Calif. Inst. of Pers. Assessmen 
and Res., April, 1955. » 
Svein, M. I. Creativity and culture. J. Psy 
chol., 1953, 36, 311-322. 
Stein, M. I., & Mzzn, B. Perceptual er 
ganization in a study of creativity. "* 
Psychol., 1954, 37, 39-43. " 
Wixson, R. C., Gururorp, J. P., & CHRIS- 
TENSEN, P. R. The measurement pU 
dividual differences in originality. Ps! 
chol. Bull., 1953, 50, 362-370. 


Received July 27, 1958. 


-—— EC 


JOURNAL or EDUCATION: X 
AL PSYCHOLOGY 
Vol. 50, No. 4, 1959 


RATIONAL AND MATHEMATICAL RELATIONSHIPS OF SIX 
SCORING PROCEDURES APPLICABLE TO 


THREE-CHOI 


CE ITEMS! 


JACK C. MERWIN 


qs Lultiple-choice items are found on prac- 
ied all Standardized paper-and-pencil 
DS and on an increasing number of 
voa UN examinations, One procedure 
ilis "i out prominently in the widespread 
lied multiple-choice items on both pub- 
is SE ier teacher-made tests: the student 
Bertho a to select what he considers to 
Sabora P ht, or best, answer and is given 
Kid a of 1 if he selects the keyed response 
d I of 0 if he selects one of the 
n rid This procedure is based only on 
penes c e or rejection of the keyed re- 
unite, and has led to suggestions for alter- 
& cae (Coombs, Millholland, 
Which an tei Dressel & Schmid, 1953) 
lon in a nies to obtain more informa- 
of the iis. empt to improve the validity 
" iat reports an investigation of 
When nina tsa of three-choice items 
ional ns o six procedures is used. Ra- 
existing u mathematical relationships 
Mvestigat aon the six procedures were 
ic indi and item characteristics for 
relatively of the six procedures will yield 
Validity high and relatively low item 
Coefficients are described. 


Six SCORING SCHEMES 


he: siz 

trom 4 2: Scoring schemes studied arise 
"sponge use of different numbers of item 
diffe ait Patterns in combination with 
and six oe Schemes using two, three, 

he Weighine patterns were considered. 
Conseg i tings used in the schemes were 
Ve integral weights and weights 


Ad, 

8 -apt 

apation px Mm the author's doctoral dis- 
Tien, * As tematical Study of Factors 


non’) Complore witty of Multiple-Choice 
* under (potd at the University of Illi- 
© guidance of Lee J. Cronbach. 


153 


Syracuse University 
which maximize the correlation of the 
scores with the criterion. 

If each subject (S) is instructed to indi- 
cate by his response the relative attractive- 
ness, to him, of the alternatives of a three- 
choice item (using ranking or sequential 
response procedures), there are just six 
different response patterns available to 
him. For example, if the alternatives are 
“a,” “b,” and “c,” the possible response 
patterns are “abe,” “acb,” “bac,” “cab,” 
“bea,” and “cba.” In scoring the item, the 
tester can utilize all of the information ob- 
tained in the S's response and assign one of 
six scores. He also has the option of disre- 
garding some of the information available 
in the S's response and assigning one of a 
lesser number of scores, the number of 
scores depending on the amount of infor- 
mation he uses. The relationships among 
the information used in the three scoring 
patterns investigated in this study are 
diagrammed in Fig. 1. It is to be recognized 
that the two-score pattern is the one con- 
ventionally used when the Ss are instructed 
to merely indicate their first choice. 

When a six-score scheme is used, all of 
the information the S gives about the at- 
tractiveness of each alternative relative to 
the others is used. Thus, if “a” is the “best” 
alternative (or right answer), the person 
who gives the response pattern “bea” will 
receive a score different from the person 
who gives the response pattern “oba,” be- 
cause in the use of the six-score scheme 
the alternatives are ranked on the key. 
That is, one of the alternatives must be 
keyed first, and one of the remaining two 
must be keyed second. In the example 
above, the tester must decide whether “c” 
or “b” is the “second best” alternative. A 
three-score scheme can be set up in such & 


154 


Six 
Scores 
abe 


Three Scores Two Scores 


“a” first "a" first 


acb 


bac 
“a” second 
cab 


“h” 


bea “o” first 


“a” third 
cba 


Fic. 1. PATTERNS OF RESPONSE TO BE 
ASSIGNED SCORES 


way that only one alternative need be 
keyed. 

In the three-score scheme studied, the 
information the S reveals about the attrac- 
tiveness of the nonkeyed alternatives (here- 
after called the distractors) relative to one 
another is not used in assigning scores. 
Only the information on the attractiveness 
of the keyed response relative to each of the 
distractors is used. The three scores used 
in this three-score scheme are assigned on 
the basis of the rank given to the keyed 
response. An S's score depends on whether 
he selects neither, just one, or both dis- 
tractors ahead of the keyed response. Thus, 
with this three-score scheme, the response 
patterns used in the illustration above, 
“bea” and “cba,” receive the same score, 
since the keyed response, "a," is ranked 
third in both of them. 

In the conventional two-score scheme 
the concern is only with whether or not the 
keyed response is the first choice. Here, 
only information of the attractiveness of 
the keyed response relative to the most 
attractive distractor is used. This informa- 
tion can be obtained either by having the 
S make a single selection from among the 
alternatives or by having him order the 
alternatives to indicate their attractiveness 
to him. r 

Scoring schemes using two, three, or six 
response patterns could be used with a 
wide variety of weightings for the different 
score patterns. For this study, consecutive 


JACK C. MERWIN 


integral weights were selected because they 
simplify computation and interpretation; 
they have generally been found to be satis- 
factory, and they provide a base of depar- 
ture to study the gains in efficiency that 
can be obtained through the use of weights 
which maximize the correlation between 
item scores and the criterion. 

Guttman (1941) has pointed out that the 
use of the mean criterion score of those 
persons who mark each response pattern 
as the weight for that response pattern 
maximizes the correlation of the item wit 
the criterion. This leads to an index which 
is comparable to the correlation ratio, eta. 
Since this procedure yields the highest pos- 
sible item-criterion correlation coefficient 
for any given number of scores, it was 
deemed appropriate for inclusion in this 
study of the relative efficiency of scoring 
schemes. The results for any item, however 
will be maximal only for the group used to 
establish the weights. Some shrinkage 12 
the validity coefficient would be expecte 
under cross-validation. 


PROCEDURE 


n * 1 were 
The six scoring schemes studied Y 


compared on the basis of item-criterion 
correlation coefficients obtained for a va" 
ety of items with different parameters. " 

The validity of an item under any Pr 
cedure will be determined by the extent h 
which people high on the criterion get "E 
Scores and people low on the criterion p 
low scores. This information can be E 
pressed in terms of the proportions e- 
groups at different, criterion levels who P4 
ceive each score. In this study, these bs 
portions are the item characteristics WI 
were varied and will be referred to 28 
item parameters. 

If the proportions of the criterion group 
receiving each score (i.e., item parameter 
are known, it is possible to calculate, 
coefficient of correlation between "ig 
scores and the criterion. This coeflicie? jd- 
conventionally considered the “item b^ A 
ity coefficient? and is thought of as ® ^ 


^ 


EVALUATION OF SIX SCORING PROCEDURES 


acteristic of the item. In this study, the 
concern was with the variety of item-cri- 
terion correlation coefficients obtained 
When different scoring procedures are used 
the same item and the same criterion. 
m bi was deemed inappropriate to con- 
ined p many different coefficients ob- 
spen or à given item with the criterion 
the EL validity coefficients." Any one of 
o Saree correlation. coefficients 
the 2 v an index of what will be called 
eei. 'ctency of the item under the given 
aie The product-moment 
bee me coefficient, between item scores 
E. criterion lor an item under a given 
us ums will hereafter be referred 
he E efficiency index of the item under 
Th Dung procedure. 

i ine tionships reported are based on 
e E in the efficiency indexes ob- 
or E “a the various scoring schemes 
should b e sets of item parameters. It 
ange ^m recognized that the same differ- 
along Ped indexes at varying positions 
equal dite leiency scale may not represent 

he item qum in the predictive value of 
on the basi he six procedures were compared 
applied t is of the resulting efficiency when 

ist me a tbrseshaite items used to pre- 
Populated ership in two or three equally 
lili ave Criterion groups. Only items 
criterion a positively correlated with the 
e ere Studied. 
not UR study of this kind it is 
items. The wy to construct or administer 
tween ibas ationships sought were those 
ciency a parameters and the resulting 
Schemes, Th exes for the various scoring 
bys Em F relationships were established 
atically varying the item param- 


$ and P 
“leney ing Studying the effects on the effi- 


exes, 


tai 


Rex, 
pipet Amone INTEGER 
CORING SCHEMES 


. ?€ relati q 
Which uti eM among scoring schemes 
teger Bie Wo, three, and six consecutive 
ciency go Were studied. The inerease in 
s shown by the item-criterion 


155 


correlation coefficients) obtained by the 
use of an integer scoring scheme which 
utilizes more information and yields a larger 
number of scores than another scheme is 
direetly dependent upon the criterion- 
relevance of the added information used 
in assigning scores. If the additional in- 
formation utilized increases the scores of 
the high criterion people somewhat more 
than it increases the scores of the low cri- 
terion people, its use in assigning scores 
will increase the efficiency of the item. The 
use of additional information to assign a 
larger number of scores can lead to more, 
the same, or less efficiency than that ob- 
tainable by using less information and a 
lesser number of scores. 


Three Integer Scores vs. Two Integer Scores 


The relationship between the efficiency 
of using three integer scores and the effi- 
ciency of using two integer scores can be 
shown for a variety of item parameters. 
The item parameters will determine 
whether the use of three integer scores re- 
sults in more, the same, or less efficiency 
than that obtained when the more con- 
ventional two-score scheme is used. 

Consider the relationships between the 
two scoring schemes when the two-score 
scheme has the conventional +1 and 0 
scores and the three-score scheme assigns 
scores of +1, 0, and —1 as the keyed re- 
sponse is ranked first, second, and third, 
respectively. The selection of these particu- 
lar consecutive integer scores is for con- 
venience of calculation and comparison; 
any other set would lead to the same re- 
sults. As shown in Table 1, in both the 
two-score and the three-score schemes, à 
score of +1 is assigned to persons selecting 
the keyed alternative first. With the two- 
score scheme, everyone not receiving & 
score of +1 receives a score of 0. With the 
three-score scheme, the people who did not 
choose the keyed response first are split 
into two groups. One of these groups, those 
who selected the keyed response second, 
would receive a score of 0 just as they 


156 


TABLE 1 


SCORES ASSIGNED UNDER Two INTEGER 
- SCORING SCHEMES WHEN “a” IS THE 
KEYED ALTERNATIVE 


Two-Score Scheme Three-Score 


Scheme 
“9” first +1 +1 
“a” second 0 0 
“a” third 0 -1 


would under the two-score scheme. The 
only persons who would receive different 
scores under the two scoring schemes are 
those who select both distractors ahead of 
the keyed alternative. These people receive 
a score of 0 under the two-score scheme 
and a score of —1 under the three-score 
scheme. 

Three Criterion Groups. A wide variety 
of sets of item parameters were used to 
study the relative efficiency of integer 
scoring schemes using two and three scores. 
Whether the three-score scheme will be 
more efficient, equally efficient, or less 
efficient than the two-score scheme for a 
given set of item parameters is indicated 
by the correlation between the scores of 
Oand —1 and the criterion scores for people 
who do not select the keyed response first. 
Three examples will serve to indicate the 
relationship between the magnitude of this 
correlation coefficient and the relative 
efficiency of the two scoring schemes. 

The efficiency indexes for the two scoring 
schemes will be the same when there is a 
small positive correlation between the 0 
and —1 scores and the criterion. That is, 
the mean criterion level of those persons 
receiving a score of 0 must be slightly 
higher than the mean criterion level o 
persons receiving a score of —1 for the two 
efficiency indexes to be equal. The size of 
the correlation which is required to make 
the two indexes equal was found to be in 

part a function of the mean criterion score 
of persons who rank the keyed response 
first and receive a score of +1 under both 
schemes. Table 2 gives the item parameters 


JACK G. MERWIN 


for a case in which the efficiency indexes 
are the same figure, .29. The correlation of 
the 0 and —1 scores from the three-score 
scheme with the criterion is .14. 

If the correlation of the 0 and —1 scores 
from the three-score scheme with the cri- 
terion is positive and relatively large, the 
three-score scheme will be superior. Once 
one of the distractors has been selected 
first, Ss are faced with a two-choice situa- 
tion. They must make a selection between 
the keyed response and the remaining dis- 
tractor. If the results of this “two-choice 
item” are highly enough correlated with 
the criterion to more than compensate for 
the difference in scoring under the two pro- 
cedures, the three-score procedure will be 
superior. Table 3 presents an illustration 
of the conditions under which the three- 
score scheme yields the higher index o 
efficiency. In this example, the efficiency © 
the item under the three-score scheme 1$ 
.81 and under the two-score scheme it 1 
.53. The correlation of the 0 and —1 score 
from the three-score scheme with the cr" 
terion is .77. £ 

The two-score scheme will be popne 
when the correlation of the 0 and —1 poor 
from the three-score scheme with the we 
terion is negative, zero, or just sligh! E 
positive. Table 4 presents an example i 
which the two-score scheme is superior. e 
this case the item under the three-5co 
scheme has an efficiency index of 45 m 
under the two-score scheme the ue 
index is 53. The correlation of the 0 9 
—1 scores from the three-score scheme W! 
the criterion is .07 for this case. notes 

For the three examples given 9? : 
then, correlation coefficients of me 
and .07 between the 0 and —1 scores pen 
the three-score scheme with the crie" 
resulted in the three-score scheme et 
equally, more, and less efficient, T¢P 
tively. : E. 

It is possible for the efficiency iran » 
an item under the three-score wr D 
surpass the index of the two-score 5° ate 
by as much as .41. Such a case is illustr 


EVALUATION OF SIX SCORING PROCEDURES 157 


TABLE 2 
ITEM PARAMETERS ror WHICH THE EFFICIENCY INDEXES FOR THE Two-ScoRE AND 
THREE-SCORE SCHEMES ARE EQUAL , 


(2) Proportion of Each Criterion Group Receiv-|(b) Proportion of Each Criterion Group Receiv- 
ing Each Score Under the Three-Score| ing Each Score Under the Two-Score 


Scheme Scheme 
Criterion Group Criterion Group 
—— 
-1 0 1 zi 0 1 
ee — ——— o 
Hr .59 .76 .90 1 .59 .76 .90 
SOTE Q .26 A8 .08 | Score 0 418 24 .10 
i .15 .06 .02 
Efficiency (r) = .29 Efficiency (r) = .29 


a " 
Proportion receiving scores of 0 or —1 under the three-score scheme. 


TABLE 3 
ITEM PARAMETERS ror WHICH THE THREE-SCORE SCHEME IS 
SUPERIOR TO THE Two-Score SCHEME 


(a) Proporti 3 T " 
lon of Each Criterion Gi Receiving (b) Proportion of Each Criterion Group Receiving 
Each Score Under the Tiree Score Schema x d Each Score Under the Two-Score Scheme 
Se 
Ex..L 1 (iion Group Criterion Group 
-i 0 1 —1 0 1 
Se i. 0 
1 5 05 50 
S -00 .05 .50 i .00 ‘ 
hes: .05 ‘70 | .50 | Score 0 i» | 95 | .50 
.95 +25 .00 
Efficiency (r) = .81 Efficiency (r) = .53 


a 
Proporti : 
Portion receiving scores of 0 or —1 under the three-score scheme. 


TABLE 4 


Trem PARAMETERS FOR WHICH THE Two-SconE SCHEME IS 
SUPERIOR ro THE THREE-SCORE SCHEME 


(a) Pr . 
ing portion of Each Criterion Gi Receiv-|(b) Proportion of Each Criterion Group Reciev- 
E Each Score Under the Three Score Scheme ad ing Each Score Under the Two-Score Scheme 
Criterion Group Criterion Group 
=i 0 1 zi 0 1 
Score d 00 1 00 .05 .50 
e 5 .05 .50 r 
1 -50 .50 .30 | Score 0 1.00* .95 .50 
-50 .45 .20 
Efficiency (r) = .45 Efficiency (r) = .53 


"Oporti, IT 
On receiving scores of 0 or —1 under the three-score scheme. 


158 


JACK C. MERWIN 


TABLE 5 
ITEM PARAMETERS FOR WHICH THE EFFICIENCY INDEX UNDER THE THREE-ScoRE SCHEME 


Is .41 GREATER THAN THE INDEX 


UNDER THE Two-SconE SCHEME 


Proportions of Each Criterion Group Receiv- 
i ing Each Score Under the Three-Score Scheme 


(b) Proportions of Each Criterion Group Receiv- 
ing Each Score Under the Two-Score Scheme 


Criterion Group 


Criterion Group 


—1 0 1 -1 0 1 
1 .00 .33 .33 1 .00 33 .33 
Score 0 .00 .67 -67 | Score 0 1.005 .67 .67 
Li 1.00 .00 .00 
Efficiency (r) = .74 Efficiency (r) — .33 
? Proportion receiving scores of 0 or —1 under the three-score scheme. a 


in Table 5. In this case, the efficiency index 
of the item under the three-score scheme 
is .74, and the index under the two-score 
scheme is .33. 

Two Criterion Groups. An evenly divided 
dichotomous criterion was used to seek 
further generalizations about the relative 
efficiency of items under the two-score and 
three-score schemes. The main finding of 
this part of the study was that the three- 
score scheme will more often than not lead 
to greater efficiency than the two-score 
scheme. More specifically, when two equal 
sized criterion groups are considered, the 
following can be concluded: (a) If the 
probability that persons low on the cri- 
terion will rank one of the distractors first 
js equal to the probability that they will 
rank the other one first, the three-score 
scheme will be more efficient than the two- 
score scheme for nearly all possible re- 
sponses from persons high on the criterion. 
(b) If persons low on the criterion who do 
not rank the keyed response first have a 
large probability of ranking it second, the 
three-score scheme will be more efficient 
than the two-score scheme only when 
persons high on the criterion who do not 
rank the keyed response first also have a 
large probability of ranking it second. (c) 
If persons low on the criterion have a very 
low probability of choosing the keyed re- 


sponse first, the three-score scheme will be 
more efficient unless persons high on the 
criterion have a very high or very low 
probability of choosing the keyed response 
first. 

In any comparison of product-moment 
correlation coefficients as measures of 
efficiency for the two scoring schemes, the 
dependency of the maximum possible co 
efficient on the number of criterion group? 
must be considered. The two-score scheme 
cannot yield a coefficient of 1.00 when more 
than two criterion groups are utilized and 
the three-seore scheme cannot yield 9? 
index of perfect efficiency when more than 
three criterion groups are used. For three 
equally populated criterion groups; the 
maximum coefficient for the three-score 
scheme is, of course, 1.00; for the two- 
score scheme it is 866. When considering 
the relative efficiency of the two scoring 
schemes with three equal sized criterion 
groups, this limit on the efficiency index 
should be considered. When two ertet 
groups are used, the maximum possib 
efficiency index is 1.00 for both schemes 


Six Integer Scores vs. Three Integer Scores 


The use of six consecutive integer 807 
involves the use of all the informat 
utilized in the three-score scheme plus 
information on the relative attractive?” 


EVALUATION OF SIX SCORING PROCEDURES 


of the distractors to each other. With E 
keyed as the best answer and “b” keyed 
as the “second best” answer, the response 
hm “abe,” “ach,” "bae," “cab,” 
di a des “cba” might be assigned scores 
Ex E ; 2, 1, and 0, respectively. Such 
= ent of scores is arbitrary, and 
x of consecutive integers would lead 
€ same results. 

ped the six-score scheme will lead 
in ar ex of efficiency which is larger, the 
As, "da smaller than that obtainable 
M i à e use of the three-score scheme 
m pend by the relationship of 
^ ipiis ional information utilized in as- 
AUN to the criterion. Thus, if 
Hoan ive nb? is keyed “second best" 
^ aer "c" is keyed “third best," 
Sidon: E Which high eriterion persons 
Darot ahead of “ce” and low criterion 
tine ios “o” ahead of “b” will deter- 
" ius Possible increase in efficiency of 
scheme = scheme over the three-score 
Which ig is though atypical, example 
iecore i ead loaded in favor of the 
fact that e up can serve to point up the 
t rough th ae gains in efficiency possible 
limited, € use of six integer scores are 


Consi 
the Hr iod two criterion groups for which 
to be P» Criterion-relevant information is 

aie" the relative ranking of the 
rank the = ors (Table 6). Both groups 
third ys ES opm first, second, and 

ri ual fre 1s 
"qr dn always eru i mi 
Tanking anne high criterion group always 
Item par; ahead of “e”, For this set of 
of t p ox the predictive efficiency 
Would he ons and three-score schemes 
E © index Saks For the six-seore scheme 
even with ja ciency Would be .29. Thus, 
Fem Y fa item parameters which are ex- 
anq hig oe to the six-score scheme, 
"eng is ooh in practice, the gain 
of; Sener: Y ido. 
i quM Study of a variety of sets 
u in eters indicates that relatively 
u efficiency can be expected 


159 


TABLE 6 
ITEM PARAMETERS FOR WHICH THE EFFI- 
CIENCY INDEXES FOR THE BIX-SCORE, 
THREE-ScorE, AND Two-SconE SCHEMES 
ARE .29, .00, AND .00, RESPECTIVELY 


Criterion Group 


0 1 
abe .00 .33 
acb .93 -00 
Response} bac .00 34 
Pattern | cab .94 .00 
bea .00 .83 
cba .88 .00 


through the use of six integer scores rather 
than three integer scores. 


CONTRIBUTION OF MAXIMIZING 
WEIGHTINGS 


The procedure given by Guttman for 
determining weights which will maximize 
the item-criterion correlation coefficient 
was discussed earlier. We will now con- 
sider the contribution of such weights to 
the efficiency of the item. 


The Two-Score Scheme 


The product-moment correlation coeffi- 
cient, used in this study as the index of the 
efficiency of an item under a given scoring 
scheme, is not affected by the size of the 
scores when just two scores are used. Thus, 
the simple use of integer scores is just as 
efficient, but no more efficient, than any 
other weighting that might be used for the 
two scores. 


The Three-Score Scheme 


The use of weights which maximize the 
correlation of item scores with the criterion 
adds most to the efficiency of the item 
when the relationship between the item 
responses and the criterion deviate from 
linearity as much as possible. These are 
also the conditions under which the three- 
score integer scheme has the greatest supe- 
riority over the two-score integer scheme. 


160 


We can examine a case where maximizing 
weights make an extreme contribution to 
the efficiency of the item. 

Consider three equal sized criterion 
groups with the item parameters given in 
Table 5. The low criterion group always 
ranks the keyed response third. 'The middle 
and top groups both rank the keyed re- 
sponse first with a probability of .33 and 
second with a probability of .67, never 
ranking it third. Under these conditions, 
the efficiency index for the item for the 
three-score scheme with maximizing 
weights is .87. If three integer scores were 
used, the index would be .74, and if two 
scores were used, it would be .33. Thus, 
under conditions most favorable to the 
efficiency of maximizing weights, the in- 
crease in efficiency of the maximizing 
weights over integer weights is only .13, 
and the increase in efficiency of three inte- 
ger scores over two integer scores is .41. 

A study of a variety of sets of item pa- 
rameters indicates that the use of maxi- 
mizing weights can lead to only relatively 
small increases in the efficiency of an item 


over that obtained when three integer 
scores are used. 


The Sia-Score Scheme 


The use of a six-score scheme with max- 
imizing weights for the responses represents 
the maximum use of all the information 
that is obtained when subjects rank the 
responses of a three-choice item. While 
other schemes may yield indexes (for spec- 
ifiable sets of item parameters) as large as 
that obtained by this scheme, under no 
conditions can they be larger. 

The set of item parameters under which 
the use of six maximally weighted scores 
will produce the greatest gain in efficiency 
over other procedures is that set presented 
earlier (Table 6) to compare the efficiency 
of six integer scores to the efficiency of 
three integer scores. In this case, all of the 
information about two criterion groups is 
in the ranking of the two distractors, the 
low criterion group always ranking io” 

ahead of “h,” and the high criterion group 


JACK C. MERWIN 


always ranking “b” ahead of “c.” As was 
pointed out earlier, this set of item param- 
eters leads to an efficiency index of .00 for 
both a two-score and a three-score scheme, 
and this would be true regardless of the 
weightings used. Six integer scores would 
yield a validity coefficient of .29, and six 
scores maximally weighted would give 9 
coefficient of 1.00. With these particular 
item parameters the advantage of this pro- 
cedure over the others is substantial, but 
this particular set of parameters is not to 
be expected in practice. The only conditions 
under which the use of maximizing weights 
can lead to an efficiency index of 1.00 35 
when each response pattern is given by 
persons from no more than one criterion 
group. r 

A study of a number of different sets i 
item parameters indicates that for sets O 
item parameters to be expected in practice 
the efficiency of an item under the other 
procedures will not be much less than that 
obtained through the use of six maximally 
weighted scores. 


SUMMARY 


In the conventional multiple-choice wi 
procedure, a subject receives a score of + 
if he chooses the keyed alternative, other- 
wise he receives a score of 0. This study 
compares the relative efficiency of three- 
choice items with specified characteristic? 
when used with six different scoring PTO 
cedures. The product-moment correlation 
coefficient between item scores and 9 SE 
terion was used as the index of efficiency" 

Scoring schemes which utilize two, thre? 
and six consecutive integer scores and a S 
of two, three, and six scores weighte 
produce maximum correlation with n- 
criterion were compared. They were ic 
pared on the bases of (a) the informati 
used in assigning scores and -— 
ciency with which they can be used 10 P 
dict membership in two or three e 
populated criterion groups. 

The following conclusions 
from the study: 

1. The use of added information to ? 


T 
were draw 


gsig? 


EVALUATION OF SIX SCORING PROCEDURES 161 


an Increased number of integer scores does 
not necessarily lead to greater efficiency of 
an item. It can lead to greater, identical, 
or lesser efficiency than that obtainable by 
using a smaller number of integer scores. 

2. If added information is used to assign 
more integer Scores, the resulting efficiency 
Will be a function of the criterion-relevance 
of the additional information utilized. 

_ use of three integer scores will 
hus often than not lead to greater effi- 
cy than can be obtained with two inte- 
ger scores when two equal-sized criterion 
Broups are used, 
Bus en use of six integer scores rather 
et ie integer scores will not, in gen- 
teat ^s toa substantial inerease, and can 

: n decrease, in efficiency. 
ine ie of six maximally weighted 
at always lead to item efficiency 
dein S as high, or higher than that ob- 

© by any other procedure. However, 


differences between the index from this 
procedure and indexes from other proce- 
dures are relatively small for sets of item 
parameters to be expected in practice. 
Shrinkage of cross-validation would be 
expected to make these differences even 
smaller. 


REFERENCES 


Coomss, C. H., MinigorLawp, J. E., & 
Wonrr, F. G. The assessment of par- 
tial knowledge. Educ. psychol. Measmt., 
1956, 16, 13-37. 

DressEL, P. L., & Scamp, J. Some modi- 
fications of the multiple-choice item. 
Educ. psychol. Measmt., 1953, 13, 574- 
595. 

Gurman, L. Mathematical and tabulation 
techniques. In P. Horst (Ed.), The pre- 
diclion of personal adjustment. New 
York: Social Science Research Council, 


1941. Pp. 251-346. 


Received August 5, 1958. 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 4, 1959 


INTERPRETATION OF DIFFERENCES AMONG AVERAGES AND 


INDIVIDUAL 


TEST SCORES 


FREDERICK B. DAVIS 
Hunter College 


The interpretation of test scores fre- 
quently involves differences among scores, 
among averages, and among scores and 
averages. For example, a school psycholo- 
gist or counselor may wish to compare 
scores on the same test that were obtained 
by two different pupils, to compare the 
mean scores on the same test obtained by 
two particular classes, or to compare a pu- 
pil’s score with the mean score of his class 
on the same test. Such comparisons involve 
differences among scores, differences that 
can be interpreted scientifically only if their 
standard errors of measurement are known. 
As long ago as 1923, the standard error of 
measurement of a difference between two 
individual standard scores was presented 
(Kelley, 1923); in 1928, the standard error 
of measurement of the mean was provided 
(Huffnaker & Douglass, 1928). But, to the 
best of the writer's knowledge, the standard 
errors of measurement of several useful 
differenees between individual and mean 
scores have not been published. The main 
purpose of this paper is to present some of 
these, together with their applications and 
derivations. They make possible estimation 
of the probability that a given difference is 
a chance deviation from a true difference of 
zero. 

In practice, differences between test 
scores large enough to have a probability 
of occurring by chance 15 times, or less, out 
of 100 may be regarded as worth interpret- 
ing. This level of significance for the two- 

tailed test of the null hypothesis may seem 
unduly lenient to psychologists accustomed 
to using the .01 or the .05 levels in experi- 
mental work. It must be remembered, how- 
ever, that the designation of any level of 
probability as “significant” is arbitrary and 
s a balance between the test in- 


resent: i 2 : 
tesprekor'à desire to avoid accepting differ- 


ences as attributable to something other 
than chance when in fact they are not and 
his desire to avoid attributing differences 
to chance when in fact they are not. For 
interpreting test scores, several factors sug- 
gest a rather lenient level of significance 23 
appropriate. First, scores derived from most 
achievement and aptitude tests are suffi- 
ciently unreliable as to make their practical 
utility doubtful if only differences among 
individual scores significant at a stringent 
level (such as .01) are interpreted. Second, 
the penalty for accepting a difference as 
owing to something other than chance when, 
in fact, it is a chance deviation from a true 
difference of zero is not usually great, be- 
cause test results are ordinarily only one 9 

several factors entering into the making 9 

any important decision about & child's 
schooling. Practical experience suggests the 
.15 level as appropriate for interpreting 
differences among the test scores of school 
children and it is, therefore, most often used 
in this paper. However, the basic equations 
permit choice of any level of significance 
that is desired. 

For clarity of presentation and for the 
convenience of test users, eight common 
types of differences among scores have bees 
identified and treated separately with illus- 
trative examples. 


Case 1: A Difference Between the Averag®? of 
Several Scores Obtained by One T naive 
and His Score on One of the Tests I ncluce 
in the Average 


L4 sas often 
Clinical and school psychologists ost 


wish to compare the average of sever: bis 
scores obtained by one individual with 
score on one of the tests included 1? a 
average. For example, an individual’s rade 
age score on 10 tests in the Wechsler A 
Intelligence Scale (excluding the Vor 


162 


UV eA 


AVERAGE AND INDIVIDUAL TEST SCORES 163 


TABLE 1 


PARTS 
CORE 2 
a uos 18-Yrar-Oxp Box on WAIS, AND Data PERTAINING TO STANDARD Errors 
UREMENT OF DIFFERENCES BETWEEN AVERAGE AND Part SconEs* 


: 5 Standard Minimum 
- TAES Ea 
Scaled Score K rom | Measurement | From Average 
Nerage of Difference? af e 
o Levi 
1. Information 
10 0. T 
2. Comprehension 1 ü E E n 
3. Arithmetic i TD SS 
i Gann: 9 —0.5 1.30 +2.55 
mi ies 13 +3.5 1.08 2.12 
5. Digit Spa 5 E i208 
pan 7 —2.5 1.52 +2.98 
7. Digit S 
^ ymbol 9 —0 
ES : 5 87 1.71 
B Biogr. en 9 —0.5 1.14 +2.23 
10. Kura oue 11 +1.5 1.12 +2.20 
ET e rrangement 9 —0.5 1.59 +3.12 
ssembly 7 —2.5 1.54 +3.02 
Sum of scaled scores 95 
verage scaled score 9.5 
6. y 
ocabulary 11 +1.5 .81 +1.59 
] 


Beale 


^ The st; 
2 " = — 
ipee rinde errors of measurement of differences and the minimum deviations from an individual's average 
at are significant at the .05 level are generally applicable and may be used to interpret the WAIS scores 


SF any 1g 
- to 19.: : 1 = ; 
9-year-old. As a matter of fact, they are serviceable in the interpretation of scores of individuals at any 


Age fo; i 
Lor Which the test is appropriate. 


ased 
on standard errors of measurement given in the Manual (Wechsler, 1955, p. 13). 


la 
rn may be compared with his score 
average Te of the tests included in the 
obtained male 1 shows the scaled scores 
WAIS Wha an 18-year-old boy on the 
Fs E ether the difference between this 
One of is scaled score of 9.5 and any 
tests ig “a scaled scores on the separate 
ermined x istically significant can be de- 
each differe any desired level by comparing 
a nce with its own standard error 
errors of ement. The required standard 
able 1 ae RITE are presented in 
tion of’ gether with the minimum devia- 
that fast part score from the average 
05 leve] e regarded as significant at the 
- The data indicate that only the 


1 
Wee 

Compare? (1944, Ch. 11) has discussed 

s discus 8 of this sort. More recently, he 

ed by cud comparisons of the sort cov- 

- 11), ^56 2 in this paper (Wechsler, 1958, 


boy's scaled score of 13 on the Similarities 
test deviates sufficiently from his average 
sealed score on 10 tests to warrant the in- 
ference that the deviation is attributable 
to something other than chance. It should 
be noted that if the 10 differences in a given 
individual's set of WAIS scores were inde- 
pendently determined by chance, one dif- 
ference in every two sets would be expected 
to show significance at the .05 level. When 
at least two differences in a given set are 
found to be significant at that level, it is 
unlikely that their occurrence can be 
attributed to chance, even though there is 
not complete independence in their deter- 
mination. 

The standard error of measurement of 
the difference between an average score 
obtained by any individual chosen at ran- 
dom from the group tested and any one of 
the scores that entered into his average may 


164 


be estimated by Equation [1]: 


SmeasqTIm-Zp 


z Ew {1] 
- FEES) as 
m m 


where 52:55 is the variance error of meas- 
urement of the sum of the m part scores, 
m is the number of parts for which the 
scores are comparable, T/m is the average 
of the comparable part scores, and Smeasz, 
is the variance error of measurement of any 
one of the comparable part scores. Some- 
times it is convenient to obtain Siiip by 
summing the variance errors of measure- 
ment of the part scores, thus making use 
of the fact that 


" 
2 2 
easy — Dite 


Whether the difference between an indi- 
vidual’s average of m comparable scores 
and his score on one of the m parts is sig- 
nificantly different from zero at any de- 
sired level may be estimated with service- 
able accuracy when the standard error of 
measurement was computed in a large 
sample by means of the critical ratio: 


_ (T/m — Z) - 0 


SmeaspIm)- zr) 


CR [2] 


Differences between an average and 
scores on any one of the tests included in it 
are convenient to interpret only if the 
separate tests yield comparable scores. 
The latter are defined, for purposes of this 
paper, as transformed obtained scores for 
which the corresponding points in the dis- 
tributions of true scores are exceeded by 
the same percentages of examinees in a de- 
fined sample. The transformation is made 
simply to cause the comparable scores to 
be numerically identical. It is not neces- 
gary that comparable scores be measures of 
the same ability or that they be equally 
reliable. Serviceable approximations to 
comparable scores, as $0 defined, may be 
obtained by the method described by 
Flanagan (1951). They have been pro- 


FREDERICK B. DAVIS 


vided, though by a different method, for 
scores on the parts of the WAIS. 

Derivation of Equation [1]. Let T repre- 
sent an individual's total score made up of 
the sum of m part scores, any one of which 
may be denoted as 7. 'Then 


T=A+4+B+---4+M 


and 


31 
m 


319 


the average of the part scores. Suppose that 
an indefinitely large number (n) of equiva- 
lent forms of the m parts were administered 
to one pupil chosen at random from a group 
for which the tests are appropriate, and 
postulate that the pupil's true score in 
each of the abilities measured remained 
constant throughout the testing. A distri- 
bution of the n differences between the 
pupil’s n average scores and his n scores on 
any given part would be produced. These 
differences would be normally distributed 
around a mean approaching (T,/m) — Tis 
the difference between the pupil’s true 
mean and true score. Since it was postu- 
lated that the pupil’s true scores woul 

remain constant, o%,,„ = oj, = 0. Conse- 
quently, the population variance of the dis- 
tribution of differences may be written 48° 


2 2 2 
TmeastTim-D = TT ejm + GI, [3] 


— 20r, ,,01,PCT s 1) Te 


where the subscript e denotes error of meas- 
urement, c? denotes a population variance 
and p denotes the product-moment corre 
lation coefficient in the population. 
It can easily be shown that 
gr. (4l 


D(T I m)1, = on, 


Hence, Equation [3] may be simplified di 


ot, (m= 2) G3, 8 
EN =) 


TRES " eas 
In practice, if the variance errors of m 


OmeasqpImy- p = 


——————————— a —— € — ee CCT 


oe 


AVERAGE AND INDIVIDUAL TEST SCORES 165 


urement have been computed on the basis 
of a large sample, they may be used in place 
of the Population variance errors of meas- 
urement in Equation [5] If scores on all 
parts of total score T' in Equation [5] are 
comparable, we may denote them with the 
capital letter Z and write: 


2 2 
ST, = Sinea op 


= at 
7G EZ pet ++ Zap) 


2 

-58 2 2 
Z4e + Szmg. + ++ + Sz. 
d 2 

= 2 Smeasz, 


a intercorrelations of the errors of 
Sem fir Mi in separate tests approach 
E of iege aie large number of 
conditions, S a ministered under proper 
By definiti 2 2 
The bees Sp a v cane 
Parable ce ‘aw scores to com- 
appreciable c as defined above, has no 
ith other effect on their correlations 
if we “on variables; hence, rir’ = fz zy 
nö me that transformation to have 
effect on th + ee, a 
Part-scor e reliability coefficient of 
€ I of total score T. Then: 


2 
äi 82 a trr’) 


he; 
2 2 
mf a7) a 82, Smeasy 
+> | 8s (= rrr) = 2 X 
ST SI 


Y maki 
these y ng appropriate substitutions of 


ERE 
easily obtained. 


Case 

2: ] 

Sero] gerente Between the Average of 
cores Obtained by One Individual 


ond His 
£ trage on a Test Not Included in 


The y 
hose wcabulay scaled score of the boy 
Was e Scores are presented in Table 

; X 
his Sg from the computation of 
M -mo Scaled score of 9.5, Hence, the 
eas enl correlation of errors of 
nig," Vera in the Vocabulary score and 
D Nant] y Ge. Sealed score would be insig- 
itely large erent from zero if an indefi- 
number of equivalent forms of 


alues in Equation [5], Equation [1] | 


the test were administered to him. If we 
assume this correlation to be zero, as is 
customary, the analogue of Equation [3] for 
the population variance error of measure- 
ment of this difference between the average 
of several scores obtained by an individual 
and his score on a test not included in the 
average may be written as: 


2 2 2 
Omeasirim-¥) = FT etm + OV, 


This leads to Equation [6], which may be 
used to estimate the standard error of 
measurement of a difference of this kind 
when a large sample is used in computing 
the variance errors of measurement: 


[m 2 
SmeasqmImy-ZV) = MT + Smeasg [6] 
m? Li 


The value yielded by Equation [6] may 
be used in a critical ratio analogous to 
Equation [2] for determining the statistical 
significance of a difference between the 
average of several scores obtained by one 
individual and his score on a test mof in- 
cluded in the average. It should be recalled 
that such differences are inconvenient to 
interpret unless all of the tests yield com- 
parable scores. 


Case 8: A Difference Between the Average 
Score of Several Specified Individuals on 
One Test and the Score on the Same Test 
of One of the Individuals Included in the 
Average 
At the end of the school year, the 10 

pupils in Miss Black’s small fifth-grade 

class obtained the following grade scores 
on the Paragraph Meaning Test of the 

Stanford Achievement Test, Intermediate 

Form J: 97, 81, 72, 64, 60, 58, 51, 47, 43, 

35. The average grade score was 60.80, and 

the standard deviation of scores was 17.70. 

The standard error of measurement of the 

difference between any one of these 10 

scores, chosen at random, and their average 

may be estimated by Equation [7]. 


ES 


N 


Smeasy yj 7 Smeasx 


166 


where N is the number of pupils whose 
scores were averaged, and a large sample 
was used in computing the standard error 
of measurement. 

The standard error of measurement re- 
ported by the publisher of the Stanford 
Paragraph Meaning Test is 6 grade-score 
points in a sample of 243 fifth-grade pupils 
in which the mean grade score was about 
58 and the standard deviation was about 
19 grade-score points. For the difference 
between the mean grade score of the 10 
pupils and the score of any one pupil in 
Miss Black's class, Equation [7] yields a 
standard error of measurement of 5.70. To 
determine the smallest difference signifi- 
cant at the .15 level, 5.70 may be multi- 
plied by 1.44, which is the approximate 
critical ratio when the standard error of 
measurement has been computed in a 
sample as large as 243. The result (rounded 
to the nearest tenth) is 8.2. Thus, only 
scores of 97, 81, 72, 51, 47, 43, and 35 are 
found to be significantly different at the 
-15 level from the average of 60.80. 

Derivation of Equation [7]. Let X14 , Xip , 

+++, Xiy represent the raw scores of each 
of N pupils on Form 1 of Test X, and let 
X, represent the mean of these scores. 
Assume that an indefinitely large number 
(n) of equivalent forms of the test are ad- 
ministered to the same N pupils, and postu- 
late that the true score of each pupil re- 
mains constant throughout the testing. 
Then X24 , X34, *** , Xna represent addi- 
tional raw scores of pupil A, etc.; and 
Xi, Xs, +++ ,X represent additional mean 
scores of the N pupils. For any pupil, say 
Pupil A, an essentially normal distribution 
of differences will be produced between the 
mean of the n means and the pupil's scores 
on successive forms of the test. The mean of 
this distribution of differences will approach 
Xs Xn whereX, is the true mean of 
the N pupils and X4 is the true score of 
Pupil A. Since it was postulated that the 
true score of each pupil would remain con- 
stant throughout the testing, the variance 
of the distribution of differences is ma 
ated only by errors of measurement an 


FREDERICK B. DAVIS 


may be written as: 


2 2 2 
Og x4) = OF, + 0x, 


[8] 


— 20s 8 x, Px.xeA 


where oz, is the population variance error 
of measurement of the mean, ox,, is the 
population variance error of measurement 
of Pupil A's score, and pz,y,, 18 the corre- 
lation of the errors of measurement in the 
n successive means and the m successive 
raw scores of Pupil A. This correlation may 
be expressed as: 


a [9] 
PxXea 7 JN 
The variance error of measurement of 


the mean may be written as (Huffnaker 
& Douglass, 1928) : 


C ox(1 — pxx') 


" [10] 
ox, N 


= cmeasx 


It is well known that for any pupil oa 
atrandom from an indefinitely large samP 


pa 


2 2 2 n 
OX,4 = Umeasx = e x( zx pxx') 


If Equations [9], [10], and [11] are ae 
tuted in Equation [8], the latter reduces 


PUREA ODE Oey 
Tj) =\ ET 


g whose 


u2 


where N is the number of pupil 
Scores were averaged. ig: 

The population variance error of mer 
urement may be estimated as 


aea Wisaa N — D 


sa the 
where Np is the number of pupils i is 
sample in which 57,2, was computed. "T 
interesting to note that if Np — N, Ed 
tion [12] becomes 
— = Smeax 
e 

In practice, if Smeasy has been compe, 
on the basis of a large sample, it EC to 
used in place of greasy in Equation [ 


obtain Equation [7]. 


AVERAGE AND INDIVIDUAL TEST SCORES 


Case 4: A Difference Between. the Average 
Score of Several Specified Individuals on 
One Test and the Score of an Individual 
Not Included in the Average 


Poe pupil not in Miss Black’s 
Fs was tested at about the same time as 
Pan T qe of her class with the same 
m ey grade score was 69. The standard 
ih Roue. of the difference be- 
Rien P Score of 69 and the mean score 
rahe 4 in Miss Black’s class may be esti- 
y Equation [13]: 


LIT = Smeasy y^ T1 
N 


3 puede Standard error of measurement 
ania TN Meaning Test is 6 grade- 
R ee i Equation [13] yields a stand- 
of 63 for Measurement of the difference 
Bid he ps the average score of 60.80 
e m of the pupil not included in 
nificant ay The smallest difference sig- 
Points us ds -15 level is, therefore, 9.1 
we nu ed to the nearest tenth), and 
ls adl e that the pupil's grade score of 
ave Significantly different from the 


Tage T : 
pupil Score in Miss Black’s class of 10 


[13] 


| 
Der band [13] is derived in the same man- 
Equation fg] ic [7], but the analogue of 
Variance a in its derivation includes a co- 
QUation m that equals zero. Therefore, 
8 [7] and [13] differ. 


Case 8: x 
Score 7 Difference Between the Average 
Jineg Po; Hep resentative Sample of a De- 
ame Te me and the Score on the 
rely duds of a Particular Individual Not 
n the Sample 


A tw. 
elfth.. d 
Nove th-grade pupil, tested early in 


mbe: x 
Sbeeq o T, obtained a scaled score of 68 in 


Readin Comprehension on the Davis 
pheth " m Series 1. To determine 
he à. 3 score differs significantly from 


‘Vera, 
Bade n^ the norms group of twelfth- 
êz » Equation [14] may be used: 


x») 
[4] 
Ar. sme 
1 x 


167 


where sx and N are, respectively, the vari- 
ance and number of pupils in a representa- 
tive sample from a defined population, Xp 
is a particular individual not included in 
this sample, and Np is the number of pupils 
in the sample used to compute Sixeasy - 

The norms group of twelfth-grade pupils 
tested in September and October of 1957 
included 5596 boys and girls in which the 
mean and standard deviation of Speed of 
Comprehension scores were 71.7 and 7.1, 
respectively. The standard error of meas- 
urement is given in the test manual as 2.9 
in a sample of 1096 twelfth-grade pupils in 
which the mean and standard deviation 
were, respectively, 73.5 and 7.3. For these 
data, Equation [14] yields a value of 2.90. 
Since the smallest deviation from the 
average score of 71.7 in the norms group 
that is significant at the .15 level is, there- 
fore, 4.2 (rounded to the nearest tenth), we 
cannot conclude that the pupil’s score of 
68 differs significantly from the average 
score of pupils at his grade level. 

It is obvious that Equation [14] has gen- 
eral utility for establishing the significance 
of a difference between the mean score of a 
sample drawn at random from a defined 
population and the score of a particular 
individual. It may be noted that when the 
sample is very large, the value yielded by 
Equation [14] will be essentially the same 
as the standard error of measurement of 
the individual’s obtained score. The vari- 
ance of means in Equation [14] is that of 
the means of successive samples drawn at 
random from a defined population, whereas 
in Equation [13] it is that of the means of 
the same sample of individuals on successive 
equivalent forms of the same test. 


Case 6: A Difference Between the Mean 
Scores of One Individual on n Equivalent 
Forms of Test X and on m Equivalent 
Forms of Test Y, When Scores on Tests 
X and Y Are Comparable 


To increase accuracy of measurement, 


more than one of the equivalent forms of 


a test may be administered to one indi- 


168 


vidual. It is well known that the standard 
error of measurement of the mean score on 
n such forms of, say, Test X is: 


Smeasx 
Smeasy = EU 
n 


[15] 

It can easily be shown that the standard 
error of measurement of the difference be- 
tween the mean of n equivalent forms of 
Test X administered to one individual and 
the mean of m equivalent forms of Test Y 
administered to the same individual or to 
another individual can be estimated by 
Equation [16]: 


Smeasx | Sm 
cas 
Smeas(x y) = ME qoc 
n m 


If the numbers of cases on which the 
variance errors of measurement in Equa- 
tion [16] are large, the value yielded by this 
equation may be used in a critical ratio 
analogous to Equation [2] for estimating 
the statistical significance of the difference 
between two means of equivalent forms. 


[16] 


Case 7: A Difference Between the Mean 
Scores on the Same Test of Two Particular 
Partially Overlapping Groups 


Miss Black’s fifth-grade class of 10 pu- 
pils is one of 14 fifth-grade classes in Smith- 
ville. The mean grade score of the 245 pu- 
pils in these classes was 64.00 when they 
were tested with the Stanford Paragraph 
Meaning Test on the same date. Whether 
the difference between the mean score of 
these particular 245 pupils in Smithville 
and the mean score of 60.80 in Miss Black’s 
class is significant can be determined by 
comparing the difference of 3.20 grade-score 
points with the estimated standard error of 
measurement of the difference between the 
means of these partially overlapping groups. 
This standard error of measurement may 
be estimated by Equation [17]: 


) = Smeasx 


“(2g m 
Np—1NNs Nr 


Smeas g Xs 


FREDERICK B. DAVIS 


where X; and X; are the means and Nr 
and Ns the numbers of cases in the total 
group and subgroup, respectively; and 
Smeesy iS the standard error of measurement 
computed on the basis of Np cases having 
a mean and standard deviation similar to 
those in the total group. 

The standard error of measurement for 
the Stanford Paragraph Meaning Test us 
given by the publisher as 6 grade-score 
points in a sample of 243 fifth-grade pupili 
Equation [17] yields a value of 1.84, an 
the t ratio is 1.74, with 242 degrees of free- 
dom. Hence, we conclude that the differ- 
ence between the mean of Miss Black's 
class and the mean of all pupils in the 1 
classes (including Miss Black’s) is sign- 
cant at better than the .10 level. t 

Derivation of Equation [17]. Suppose bes 
an indefinitely large number (n) of equiv 


ar 0 
lent forms of Test X were administered t 


every individual in Group T', and postulat? 
that the true score of each individual T: 

mained constant throughout the u— 
If the mean score of Group T were co : 
puted for each of the n equivalent fi 
n means would be obtained. If the a 
score of Subgroup S of Total Group T Eam 
computed for each of the « arene 
forms, n means for Subgroup S would li s 
wise be obtained. An essentially pe 
distribution of differences beween the un 
means for each form would be gam ap- 
The mean of these differences woul ipt 
proach X,» — Xis, where the subse bó 
1T denotes a true score in Group T a. 
subscript ¿S a true score in Subgrou 
The variance of these differences MAY 
expressed as: 


p 5 
e 


2 como CR 
= 6 aX eg X ise 


2 
Omeasy= = 
Zra) T. 
erro 


where the subscript e7' denotes 2D. 
measurement in Group T and e$ terms o 
of measurement in Subgroup S. In. 


É PE 
individual scores, this may be written 


2 
measg = 
Zr 


AVERAGE AND INDIVIDUAL TEST SCORES 


which will be found equal to: 


2 NT, 
Cneas ey = Leer 
EIE. Ne 
Ns, Ns (18) 
ES Ze  2Doerees 
N NzNs 


lw 2. 

comune that ocr is the same for every 

E ividualin Group T and that e?» = ges, 
quation [18] may be simplified to: 


greed 
Ts TNs Nr 


f whi i: 
Tom which Equation [17] may be obtained. 


2 
9meas(s 


C s; " 
rd A Difference Between the Mean 
es of Two Particular Nonoverlapping 
oups on Any Variable 


uo pupila in Miss Green's class and 
test on "wn's class were given an algebra 
the end gt send the same date at 
carefull 2 school year. The tests were 
iner e ministered by the same exam- 
cooperat, a records indicate that each pupil 
tized ~~ sin The results are summa- 
tained se able 2. Since the variances of ob- 
Significant, in the two classes are not 
assume e Y different (F = 1.29), we 
in the Sic ci variances of the scores 

Se Cree ation or populations from which 
Were dasen T and Miss Brown’s classes 
o e distrib f we also assume these scores 
or Westen normally in the population 
between th ns, we may test the difference 
it ia o Sne Sample means to see whether 


18 signi : 
t ratio (s inniti different from zero. The 


“tn = 4 / Z(Xo — Xo + EGts — Xn)? 
Ne--N&—2 


Sing 

etx à 

We concluda with 62 degrees of freedom, 
e that the two classes may well 


169 


TABLE 2 
Scores IN Two CLASSES 


Miss Greena Miss Brown’s 


Class Class 
N 34 30 
X 28.68 26.50 
sx 11.28 9.95 
8x 1.97 1.85 
rxx’ .89 .86 
Smeasy 3.72 3.72 


have been drawn at random from the same 
parent population. This result indicates 
that differences between the means of pairs 
of classes of the sizes of Miss Green’s and 
Miss Brown’s drawn successively at ran- 
dom from the population of such classes 
could be expected by chance often to be 
as large or larger than the difference of 
2.18 points between the means of Miss 
Green's and Miss Brown's classes. But this 
result does not tell us whether a difference 
as large as 2.18 points could often occur by 
chance between the means of these two 
classes if they were tested over and over 
with equivalent forms of the test and if the 
true difference were zero. Yet this may be 
exactly what we want to know. To deter- 
mine this, a different ¢ ratio should be em- 
ployed, with the estimated standard error 
of measurement of the difference between 
the means as the denominator. It can easily 
be shown that the standard error of meas- 
urement of a difference between the means 
of two particular nonoverlapping groups on 
any variable, say Test X, may be estimated 
as: 


d u9) 
meee ER EE 
Np—1\Ne Nes 
where Smeasy WAS computed in a sample of 
Np cases and No and Nz are the numbers 
of cases used in computing Xo and EET 


respectively. 
The standard error of measurement of 


the algebra test given to Miss Green’s and 


170 


Miss Brown's classes was computed on the 
basis of the 64 pupils in both classes, the 
mean errors of measurement being insig- 
nificantly different in the two classes. For 
the difference between the means of the two 
algebra classes, we have t = 2.32, with 62 
degrees of freedom, and we conclude that 
the mean ability in algebra of these two 
particular classes is almost surely different, 
Miss Green’s class being superior. 

As is well known, care must be exercised 
in choosing the appropriate denominator 
for the ¢ ratio. In making this choice, the 
experimenter should ask this question: Do 
I want to find whether the algebraic signs 
would rather consistently be the same for 
differences between means of scores in suc- 
cessive samples drawn at random from the 
same population, or for differences between 
means of scores on successive equivalent 
forms administered to the same sample? 
If the former is desired, an appropriate 
standard error of the difference between 
means should be employed; if the latter is 


FREDERICK B. DAVIS 


desired, an appropriate standard error of 
measurement of the difference between 
means should be employed. 


REFERENCES 


Fianacan, J. C. Units, scores, and norms. 
In Educational measurement. Washing- 
ton: American Council on Education, 
1951. Pp. 752-760. " 

Ketter, T. L. A new method for determin- 
ing the significance of differences in M- 
telligence and achievement scores. 
educ. Psychol., 1923, 14, 321-333. 

Hurrwaxzn, C. L., & Doucnass, H. R- 
the standard errors of the mean due to 
sampling and to measurement. J. educ. 
Psychol., 1928, 19, 643-649. seats 

Wecuster, D. Measurement of adult inte 
ligence. (3rd ed.). Baltimore: Williams 
& Wilkins, 1944. dult 

Wzcnsrzn, D. Manual for the Wechsler A ies 
Intelligence Scale. New York: Psy¢ 
logical Corp., 1955. isal. 

Wecusier, D. Measurement and app 
of adult intelligence. (4th ed.) Ba 
more: Williams & Wilkins, 1958. 


Received December 8, 1958. 


JounwAL or Epu 
N CATIONAL PSYCHOLOGY 
Vol. 50, No. 4, 1959 x 


EFFECTS OF “BRAINSTORMING” INSTRUCTIONS ON 
CREATIVE PROBLEM SOLVING BY TRAINED 


AND UNTRAINED SUBJECTS! 


SIDNEY J. PARNES 4x» ARNOLD MEADOW 
University of Buffalo 


d initial papers have reported the re- 
bung, an investigation of the effects of the 
tanta. pe method on creative thinking 
& Rees w & Parnes, 1959 ; Meadow, Parnes, 
S ii e press). In this method, a subject 
y ie ied to attempt to solve problems 
B6: o i all tentative solutions which 
tion of th im, postponing judicial evalua- 
Der ose solutions to a subsequent time 
m (Osborn, 1957). 

dua qs of the first experiment in- 
"opel at a class group instructed for 
ei dr in creative problem solving 
ing) ine. (which emphasized brainstorm- 
Seven cw its productivity in five of 
with a c S of creative ability as compared 
control group (Meadow & Parnes, 


1959), p 
effects (i he second study investigated the 
he b In two creative ability problems) of 


` a ening method as compared 
baq ie produce good ideas, penalty for 
Subjects a instruction, utilizing the same 
lested r each instruction. Results again 
ainstormi superior performance for the 
Press) BS instructions (Meadow et al., 
yed ae oth of these experiments em- 
n eaten who were taking the course 
Siz, time solving which empha- 
‘Thesis Pos methods (Parnes, 1958). 
hine en Study was designed to deter- 
tions woua'y the brainstorming instrue- 
studenty wt be effective with untrained 
truction 10 had never received previous 

ES Were f in the method. Three hypothe- 
trai,  "Inulated: (a) With a group of 
Subjects, the brainstorming as 

m nonbrainstorming instruc- 
ES uce a significant increment 
umber of good quality ideas 


T: 


Plo 


c 
4, Dare 
ines will 
"abso 

1 
fro, this 
9m the (ciere was financed by a grant 
‘ative Education Foundation. 


171 


on two creative ability problems. (b) The 
interaction effects found in a previous ex- 
periment (Meadow et al., in press) will be 
confirmed. Specifically, a nonbrainstorming 
instruction administered first will inhibit 
the absolute number of good quality ideas 
produced under a brainstorming instruction 
which follows. Similarly, a difficult problem 
administered first will inhibit the absolute 
number of good quality ideas produced in a 
relatively easy problem presented in a test 
period which follows. (c) If the brainstorm- 
ing instructions are administered to a group 
of students who have taken a one-semester 
course in creative problem solving empha- 
sizing this technique, and to a group of 
students who have not taken the course, 
the former group will produce a signifi- 
cantly greater absolute number of good 
quality ideas than the latter group. 


METHOD 


Subjects. In order to test the first two 
hypotheses, 52 University of Buffalo under- 
graduate students were used as Ss. No S 
had taken the creative problem solving 
course or had received any training in the 
brainstorming method. 

The third hypothesis was tested by com- 
paring à group of University of Buffalo 
undergraduate creative problem solving 
course students with a group of students 
who had not taken the course. Members of 
the two groups were individually matched 
on the basis of the variables of grade point 
average, age, and sex. The matched groups 
each comprised 17 students selected from 
a larger group of 21 creative problem 
solving course students and 26 noncourse 


students. 


Grade point average a8 based on 12 cred- 


172 


its of courses averaged 1.21, with a stand- 
ard deviation of .50 for the course group, 
and 1.21 with a standard deviation of .45 
for the noncourse group. Each individual 
experimental S was closely matched on the 
grade point average variable with each 
control S. The mean of the differences be- 
tween each of the matched pairs was .13. 
The average age of the experimental group 
was 25.50, with a standard deviation of 
9.27; average age of the control group was 
24.06, with a standard deviation of 6.80. 
A t test indicated no significant difference 
between these means. The individual exper- 
imental versus control S matching was less 
precise on the age variable. The mean of the 
differences in age for the matched pairs 
was 6.87 years. There were 11 males and 6 
females in the course group, 13 males and 
4 females in the control group. The pairs 
were not closely matched on the sex vari- 
able; there were 9 same sex pairs and 8 
different sex pairs. This distribution in the 
matching resulted from the a priori con- 
sideration that the grade point average was 
a more crucial variable than that of age or 
sex. Close similarity of matched pairs with 
respect to these latter variables was ac- 
cordingly sacrificed to achieve greater 
similarity on the grade point average vari- 
able. 

The course students differed from the 
noncourse students in having been exposed 
to a one-semester course in creative prob- 
lem solving. The principles studied in this 
course are those described in Osborn's 
textbook (1957). An integral part of the 
teaching method is the repeated practice 
given to students in the utilization of brain- 
storming. A specific example of instructions 
for employing the method is presented in 
the Procedure section. 

Creative problem solving course stu- 
dents were tested during the last two weeks 

e semester. 
of Eicher problems. The Broom and 
Hanger problems used in the present ex- 
periment were selected from Part Y of the 
AC Test of Creative Ability. This test is 


reported to have differentiated groups of 


SIDNEY J. PARNES AND ARNOLD MEADOW 


creative from noncreative Ss (Harris & 
Simberg, 1954). Data are available from & 
previous experiment which indicated that 
scores on the Hanger problem are positively 
correlated with other tests which have been 
designed to measure creative thinking 
(Meadow & Parnes, 1959). Instructions 
for the Hanger problem required Ss to list 
all possible uses for an ordinary wire coat 
hanger. Similar instructions were given for 
the Broom problem. 

The Examiner’s Manual for the AC Test 
of Creative Ability describes two different 
scores for Part V, containing the Hanger 
and Broom problems—a "quantity" Boore 
and a “uniqueness” score. If an S is m- 
structed to produce ideas without regard 
for quality (as under the brainstorming 10- 
struction), he will generally produce & 
larger absolute number of ideas than an 
who either implicitly or explicitly is COD" 
cerned about the quality of ideas. The 
crucial question with respect to evaluating 
the brainstorming method is its efficacy 7? 
producing an increment in the number © 
creative ideas of good quality. 

The decision was accordingly mad t 
devise a new scoring method for the Me 
for the purpose of deriving a reliable "CT 
ativity quality” score. This score was E 
on two attributes, the first of kei ot 
uniqueness of the idea. Uniqueness of ice E 
tion would seem to represent, Haroren a 
necessary but not sufficient condition F 
creativity. Accordingly, the concept b 
creativity employed in the present lacs 
ment was defined to include also à gie 
attribute—value, This latter attribute | ly 
broadly interpreted to include any nd a 
useful value whether it was econ? 
social, or aesthetic in nature. 


e to 


2 Scores on the Hanger problem a 
previous experiment correlated WI i 
creative ability tests as follows: na Titles 
Unusual Uses, .473; Guilford’s Plo 301; 
High, .452; Guilford's Apparatus npa- 
TAT Originality, .520. All but ths oval o 
ratus correlation were at the m the AP 
significance. The correlation with 05 
paratus test was significant at the - 


- 


: “BRAINSTORMING” AND PROBLEM SOLVING 


e Dapeng this scoring procedure, 
of sponse was copied on a separate slip 
Paper, given a code number, and then 
m to the rater for evaluation. This 
ae _Was necessary because Ss in- 
ance - give a larger number of ideas in 
Dent to brainstorming than to non- 
iea arming instructions. If the rater 
miter P Rs on the original papers 
Miu A s, he may consequently become 
Minh d he experimental conditions under 
Bree e mas were produced. Pres- 
coded slip o INS. response on a separate 
possible bina. paper eliminates this type of 
a ee was instructed to evaluate 
aye a by two separate criteria: 
Tespos eei degree to which the 
use of the E from the conventional 
Which h ject, and (b) value—the degree 
Social a. he response was judged to have 
» €conomic, aesthetic, or other useful- 


Dess, P 
ona on uniqueness attribute was rated 


litt] 


Moder. Y 
ate E Uniqueness, three points indi- 
Value arked uniqueness. Similarly, the 


Ing t] 

he hypotheses, these scores were 

mito: a two-value scale—“good” 
" 

and “bad” responses. A response 


Scored “ 
Comp; as “good” if i 
Mined ana good" if it represented a 


u 

Btkediy or moderately unique and 

After Y valuable, 
coreg, the responses were individually 
4 Od res Sy werg decoded and a total 

DY resp Ponse” score assigned to each S. 
Meanj " tis Which duplicated in essential 
Same mad other response given by the 
a The thie ed from the scoring. 
ki Xm ater reliabilities for the Hanger 
ra vious fis Problems, as determined in 
“SDectiy, periments, were .74 and .91, 


el 
pow "4 sci & Parnes, 1959; 
"Ocedure - M press). 


To test the first two hypoth- 


173 


eses, the untrained Ss were given first one 
problem and then the second immediately 
thereafter. All tests were group adminis- 
tered to each of the four sections separately. 
One problem was given under brainstorm- 
ing instructions and the other under non- 
brainstorming instructions. The brain- 
storming instructions were as follows: 


You are to list all the ideas that come to 
your mind without judging them in any 
way. Forget about the quality of the ideas 
entirely. We will count only quantity on this 
task. Express any idea which comes to your 
mind. As you go along, you may combine 
or modify any of the ideas which you have 
already listed, in order to produce addi- 
tional ideas. Remember that quantity and 
freedom of expression without evaluation 


are the key points. 


The corresponding nonbrainstorming in- 
structions were as follows: 


You are to list all the good ideas you can 
think up. Your score will be the total num- 
ber of good ideas. Don’t put down any idea 
unless you feel it is a good one. 


The Ss were allowed five minutes for each 
problem. 

Half of the Ss were given first the Hanger 
problem and then the Broom problem; the 
other half were given the problems in the 
reverse order. Similarly, half of the Ss were 
given first the brainstorming instructions 
and then the nonbrainstorming instruc- 
tions, and the other half were given the 
instructions in the reverse order. Thus, 
Group 1 was given the Hanger problem 
under brainstorming instructions as its 
first problem, and the Broom problem 
under nonbrainstorming instructions as its 
second problem; Group 2 was given the 
Broom problem under brainstorming in- 
structions as its first problem and the 
Hanger under nonbrainstorming instruc- 
tions as its second problem. Group 3 was 
given the Hanger under nonbrainstorming 
instructions as its first problem and the 
Broom under brainstorming instructions 
asits second problem. Finally, Group 4 was 
given the Broom under nonbrainstorming 
instructions as its first problem, and the 


174 SIDNEY J. PARNES AND ARNOLD MEADOW 


TABLE 1 


Mean NUMBER or “Goon” SOLUTIONS ror BRAINSTORMING AND NONBRAINSTORMING 
INSTRUCTIONS BY ADMINISTRATIONS, FOR HANGER AND BROOM PROBLEMS COMBINED 


First Administration Second Administration Both Administrations 


N g o N z c N 


= g 
icm | 
Instructions 06 
26 | 3.73 | 2.86 | 26 | 4.81 | 3.21 | 52 | 427 | 3. 5 
NB 26 | 3.01 | 2.08 | 26 | 2.00 | 1.67 | 52 | 2.52 | 1.9 


Note:—B refers to brainstorming; NB refers to nonbrainstorming. 


TABLE 2 
Mean NUMBER or “Goop” SOLUTIONS ror HANGER AND Broom PROBLEMS BY 
ADMINISTRATION, FOR BRAINSTORMING AND NONBRAINSTORMING 
IxsTRUCTIONS COMBINED 


First Administration | Second Administration | Both Administrations 


= = z [4 
N Ed c N Z c N Duas EET 
Problems 85 
Hanger 38 | 377 | 2.58 | æ | s& | s15 | e | 3.70 | 2-85 
Broom 26 | 3.00 | 2.37 | 26 | 3.00 | 2.64 | 52 | 3.00 | 2- 


TABLE 3 
SUMMARY or ANALYSIS or VARIANCE OF ABSOLUTE Numer or “Goop” SOLUTIONS 
BRAINSTORMING vs. NowBRAINSTORMING Instructions BY TYPE OF 
PROBLEM, BY ADMINISTRATIONS 


Source df MS F P 
—— e! 
Between Ss 51 
EXP 1 5.08 <1.00 Dr 
LT 1 29.08 3.01 ng 
PXT 1 01 «1.00 D 
error (b) 48 9.07 
Within Ss 
Instructions (I) 1 79.63 26.81 ri 
Problems (P) 1 16.17 5.44 <. 
Test Periods (T) 1 01 «1.00 P. 
TXP xT 1 8.09 2.72 2 
aser) 48 2.97 
Total "es Test 


mi 
Note:—Instructions refers to Brainstorming vs. Nonbrainstorming; Problems refers to Hanger vs. Broo! 
Periods refers to the first test administration vs. the second administration. 
er) 


"BRAINSTORMING" AND PROBLEM SOLVING 


Hanger under brainstorming instructions 
as em second problem. 

oe (1953) Type V analysis of 
T qui was used for statistical analysis. 
i Tee experimental variables, Instruc- 
E Problems, and Test Periods (first and 

ipie were all within-Ss factors. 

one the third hypothesis of the 
= t, the number of “good” ideas 
igei E anger problem under brainstorm- 
: Structions for course and noncourse 
Stoups was compared. 


we tests were administered in groups 
8 regular class hours, 
RESULTS 
The m 
R ean numbers of “good” solutions 


u 

5 wg under the two instructions, for 

are pres and second test periods separately, 
ented in Table 1. These data show 


m « s 
Under the good” ideas were produced 
Under ¢ Tunstorming instructions than 


and ( M qunbrainstorming instructions, 
second nl his effect was grenter in the 

in ins Period than in the first. Table 
Were prod 5 that more “good” solutions 

oom, ty for the Hanger than for the 
Nees ig sh, em. Evaluation of these differ- 
of varian own by the results of the analysis 
i ce presented in Table 3. The table 


mdicat, 

es 
tio lat the main effects of instruc- 
Maj Problems are significant; the 


cffe, 
: = of test periods and all inter- 
Che en Not significant, 
R: "h ae hag of “good” ideas on the 
2 Ss eu Tor course and noncourse 
S all “sented in Table 4. Inspection 
: a Indicates à greater average 
k <i ' ideas for the ccurse 
; best “Terence between these means 
op De Corre} ` the .01 level, 
la x 
Tu deas and ^ lons between total quantity 
rela’ . The el ideas are presented in 
tations tan able indicates positive cor- 
“Wo nging from 64 
ing stormin -64 to 81 for the 
8 and two nonbrainstorm- 


um Oups of 
ant at the a wl correlations are sig- 


brain, 


175 


TABLE 4 
Comparison or MEAN Numper or “Goon” 
Sotutions or MATCHED COURSE AND 
NoNcoumsE GROUPS UNDER BRAIN- 
STORMING IxsTRUCTIONS (HANGER 
PROBLEM) 


W]e ilo ps We 


17 10.243.01 
4.95/« .01 
Noncourse group | 17 | 5.29)3.08 


Course Group 


TABLE 5 
PEARSON Propuct-Moment CORRELATIONS 
BETWEEN TOTAL QUANTITY oF IDEAS 
AND NUMBER or “Goop” IDEAS 


N|r P 
Brainstorming hanger 26 | .67 |«.01 
Brainstorming broom 26 | .71 |«.01 
Nonbrainstorming hanger | 26 | .64 |<.01 
Nonbrainstorming broom | 26 | .81 |<.01 


Discussion AND SUMMARY 


The experiment was designed to study 
the effects on creative problem solving (by 
untrained subjects) of instructions to ex- 
press solutions without evaluation (brain- 
storming) and instructions which required 
only solutions of good quality (nonbrain- 
storming). The design also allowed for a 
study of the effects of training in a creative 
problem solving course emphasizing brain- 
storming. Each group of Ss, (one group 
untrained, the other trained) was given 
two problems designed to measure creative 
ability in two testing periods. One problem 
was administered under brainstorming 
instructions; the other problem was ad- 
ministered under nonbrainstorming in- 
structions. The quality of the solutions was 
later evaluated by a trained rater. 

The major findings were as follows: (a) 
Significantly more good quality ideas were 
produced under brainstorming instructions 
than under nonbrainstorming instructions. 
(b) The Ss trained in a creative problem 
solving course emphasizing brainstorming 


176 


produced a significantly greater number of 
good quality ideas when using the tech- 
nique than did the untrained students. 

An additional result yielded by the anal- 
ysis of data indicated a positive correlation 
between quantity and quality of ideas. 
This correlation suggests that the efficacy 
of brainstorming in producing an increment 
in good ideas is possibly the result of the 
inereased quantity of ideas encouraged by 
the method—at least in a cognitive prob- 
Jem of the type used in the present experi- 
ment. It is likely that in the customary 
course of daily thinking, some ideas are 
inhibited by individuals because of fear of 
criticism from self or others. The brain- 
storming instruction, in reducing this in- 
hibitory factor and encouraging a greater 
quantity of ideas, seems to increase also 
the number of good quality ideas produced. 
The findings are interpreted to indicate 
that the brainstorming instruction is an 
effective method for increasing the produc- 
tion of good ideas in a partieular type of 


SIDNEY J. PARNES AND ARNOLD MEADOW 


creative thinking problem, and that it is 
even more effective if preceded by extensive 
training in its use. 


REFERENCES 


Harris, R. R., & SIMBERG, A. L. AC test of 
creative ability. (Ezaminer's manua): 
Detroit: General Motors Corp, 4 
Spark Plug Division, 1954. . 

LixpQuisr, E. F. Design and analysis of €*- 
periments in psychology and education. 
Boston: Houghton Mifflin, 1953. | i 

Mzapow, A., & Parnes, S. J. Evaluation 9 
training in creative problem solving. *'* 
appl. Psychol., 1959, 43, 189-194. H 

Meapow, A., Parnes, S. J., & REESE; m 
Influence of instructions and proie 
sequence on a creative problem solvin. 
test. J. appl. Psychol., in press. New 

Ossory, A. F. Applied imagination. 
York: Scribner's, 1957. sccaral ty, 

Parnes, S. J. Description of the Un ing 
of Buffalo Creative Problem Brice 
Course. Creative Education ^ 
Univer. of Buffalo, 1958. (Mimeo- 


Received December 17, 1958. 


JounNAL or Ep: 
UCATIONAL Psy 
Vol. 50, No. ri TES CHOLOGY 


TRADITIONAL JEWISH CULTURAL VALUES AND PERFORMANCE 
ON THE WECHSLER TESTS! 


BORIS M. LEVINSON 
Graduate School of Education, Yeshiva University 


e pinos of this paper is to suggest 
PONN erences between verbal and per- 
Eo “on test Scores, as well as test scatter, 

Y be attributed to subcultural values 


embas à 
Mpeg either verbal or performance 
un 8. These cultural emphases are re- 


Sen in s School which in its curriculum 
Pres ^. endows one activity with more 
Det A thus making another activity 
partici T comparison less desirable and 
in poi us lon in it ego deflating. As a case 
949) à » the scatter on WISC (Wechsler, 
Hi E WAIS (Wechsler, 1955) of Jew- 
Pupils chool children, elementary school 
HABE and college students of traditional 
Stound will be discussed. 


The Setting 


m sg 
e n traditional Jew, intensive Jewish 
aining ole preferred means of main- 
Ave thus ii The day (Yeshiva) schools 
Not only th een established to perpetuate 
s hationa] i religious identity but, also, 
ese school identity (Levinson, 1957). In 
On verb 1a Very great stress is placed 
of Performs ability to the relative neglect 
aintaine qm arts. A double program is 
Th One in English and one in 
With additi ere is a long school day coupled 
I5 little i 10nal hours of homework. There 
ties, ts Pu left for extracurricular activi- 
hobbies ad physical recreation, or for 
Pursuits ot directly related to scholastic 

€vinson, 1959b). 


frev, 
OUS Research 


B wel k 
Pressure Al known that environmental 
tend to develop certain selec- 


1 
A 
at y Pay 
md Tater resented in a different form 
bela dn arcam Congress in Psychol- 
Basist Writer exico City, December 1957. 
tap ance x AER to acknowledge the 
ted Some ei who has 
a. 


tive aspects of intellectual ability (Davis, 
1948; Eells, Davis, Havighurst, Herrick, 
Tyler, 1951; Green, 1953; Levinson, 1958a). 
It is further hypothesized by Hebb (1949) 
and Piaget (1947) that early perceptual 
experiences influence intellectual develop- 
ment. The pattern of successes and failures 
on verbal and performance tests also tends 
to vary in relation to cultural needs (Green, 
1953; Strauss, 1954). Dennis (1957), for 
example, found in the study of the Good- 
enough Draw-a-Man test that of a group 
of children from the same socioeconomic 
background, some were prevented from 
scoring as high as others due to certain 
cultural handicaps. The writer (1957) has 
shown that the intelligence of Jewish 
children from traditional backgrounds is 
above the average of the general popula- 
tion. It also appears from certain studies 
that mental traits become more and more 
diversified as the child grows older: boys, 
aged, 9, 12, 15 having the intercorrelation 
of memory, verbal, and number abilities, 
decline as follows: .30, .21 and .18 (Garrett, 
1946). Further, we find that children from 
upper socioeconomic levels secure a higher 
rating on verbal tests than those who come 
from lower socioeconomic levels (Eells et al, 
1951). Attendance at school and, particu- 
larly, at college, further enhances the 
differentiation of intellectual abilities 
(Hartson, 1936; Levinson, 1958b; Shuey, 
1948). The area of specialization at college 
seems to foster the development of some 
intellectual traits to the relative neglect of 


others (Hartson, 1936). 


PROCEDURE 


Tests Used 

The WISC (Wechsler, 1949) and WAIS 
(Wechsler, 1955) were used in this study. 
The WISC (Wechsler, 1949) was adminis- 


177 


178 


tered to the preschool and elementary 
school children and the WAIS (Wechsler, 
1955) to college students. 

The WISC (Wechsler, 1949) and WAIS 
(Wechsler, 1955) have both performance 
and verbal tests and have been standard- 
ized on a representative sample of the 
population. As a matter of fact, Wechsler 
(1949, p. 1) feels that the materials of the 
two tests overlap and that the WISC 
serves the same function in testing children 
as Wechsler-Bellevue (the precursor of the 
Wechsler Adult Scale) serves in testing 
adults (Wechsler, 1944). However, the 
writer does not wish to imply that the 
same test items have similar psychological 
meaning or value for children and adults. 


Subjects 


The three groups compared were pre- 
School children, elementary school pupils, 
and college students. Only those Ss were 
Selected who could be considered typical 
products of the cultural mold exerted by 
traditional Jewish values. All the Ss chosen 
were "normal" individuals who were not 
known to have personality difficulties 
which would require psychotherapy. 

Since the WAIS (Levinson, 1958b) study 
was completed at the time this research 
was begun, it was necessary to match the 
preschool and elementary school Ss to meet 
the specifications of the WAIS sample. We 
did not have enough preschool children 
with the requisite IQ distribution to match 
the WAIS sample. We finally matched 57 
WISC preschool records with 57 WAIS 
records so selected that the mean and IQ 
ranges of the 57 WAIS scores corresponded 
to the mean and IQ of the entire WAIS 
sample. The preschool sample selected con- 
sisted of 41 boys and 16 girls whose average 
Revised Stanford Binet IQ was 125.20, with 
an SD of 14.00 and whose average age was 
5.6 years. These were native born children 

who were candidates for admission to Jew- 
ish traditional day schools and who came 
from traditional homes. The full scale 
WISC of these children was 117.05, with 


an SD of 11.10. 


BORIS M. LEVINSON 


The elementary school boys were 
matched by pairs with the WAIS sample. 
These were 64 boys whose average age was 
11.31 and who were selected from a sample 
of 122 children who had attended day 
(Yeshiva) schools since the first grade. The 
full scale WISC of these pupils was 125.08, 
with an SD of 8.85. 

The Yeshiva University Ss consisted of 
64 male volunteer undergraduate and 
graduate students whose mean age Was 
21.43. All these students had good com- 
mand of English and were graduates of 
elementary and high school Yeshiva 
schools. The full scale WAIS of these 
students was 125.08, with an SD of 8.85. 

Certain limitations regarding the anne 
must be borne in mind. While the vE 
and WAIS full scale IQs were held constan 
no allowance was made for possible iid 
economie differences and the possibly 
(which was not a probability) that s 
samples chosen were not fully representa 
tive of traditional Jewish youth. Lr 
a question may be raised as to eru 
children with high performance and E 
verbal ability who could not compe" 
academically were dropped by the be^ 
Side, thus biasing the elementary qu 
and college sample. The writer peti 
that this selective factor was not opera pes 
in this study for two reasons: (a) SUE its 
school sample had high verbal emi 20; 
Revised Stanford Binet IQ being 12 ‘oss 
and (b) in the sampled Jewish day ae 
with its population preselected on the = 
of intelligence, scholastic achievemen 


piae ar o 
pends largely on attitude, motivation» 


RESULTS 

Table 1 presents the means and ani 
the preschool, elementary pohod the 
University Ss. We may note that Teo 
difference between the verbal and P j 
ance IQs for the preschool child? con 
significant only at the .10 level. > ntary, 
fidence, the differences for the SEE 
and University Ss are at the .001 Je 
confidence. hen 

We may further note that W^ 


ts for 


; 
| 


: 0 


CULTURAL VALUES AND INTELLIGENCE TEST PERFORMANCE 179 


TABLE 1 
SIGNIFICANCE OF THE DIFFERENCES BE- 
SUBEN THE MEANS or THE VERBAL AND 
ERFORMANCE IQs or THE PRESCHOOL, 


0 
LEMENTARY SCHOOL, AND UNIVERSITY 
STUDENTS 


Verbal | Perf. 


Scale | Scale | PIF- | — ! 
preschool 117.19114.05| 3.14] 1.83* 
entary 125.08/105.27/19.81]10.97** 
school 
Diversity 125.59/105.30/20.29/10.75** 


. 
D level of Significance, 
*01 level of Significance, 


in arenes between the differences shown 
: vid are compared, those involving 
or the chool and elementary school and 
fois Plon] and University compari- 
confide s Significant at the .01 level of 
they (Walker & Lev, 1953, p. 158). 
"niai the difference between the dif- 
School ES the instance of the elementary 
insignificant University Ss is statistically 
the table. ^ These ts are not shown in 
sealed r^s Shows the means and SDs of the 
re equivalents of the subtests of 


the verbal and of the performance parts of 
(a) WISC for the preschool and elementary 
school children and (b) WAIS for the 
University students. An analysis of the 
table indicates: (2) minimal scatter among 
the subtests of the preschool children, the 
mean subtest scores ranging from 11.36 to 
13.04, (b) pronounced scatter among ele- 
mentary school pupils and college students, 
elevated verbal subtests and depressed 
performance subtest scores being charac- 
teristic of both. The mean scores of the 
elementary school children range from 
10.16 to 15.16 and of the University stu- 
dents from 8.84 to 15.64. 


Discussion 


The writer is of the opinion that the 
cultural pressure exerted towards verbal 
learning has brought about this differentia- 
tion of ability at the elementary and college 
levels. He further thinks that somewhat 
similar psychometric patterns may be 
found whenever there is great emphasis on 
verbal learning and a relative neglect of 
performance arts. The writer (in press [b]) 
found such patterns among Irish, Italian 


T TABLE 2 
AND SDs or THE SCALED SCORE EqurvarLENTS or THE WISC AND THE WAIS 
Preschool Elementary School University 
Y, Mean SD Mean SD Mean SD 
Tbal 
nfo; 
Co arion 12.80 | 3.42 | 14.00 | 2.56 | 15.64 | 1.95 
Atithmer ton 12.98 | 2.70 | 15.16 | 2.90 | 15.49 | 2.43 
Simi], e 13.04 | 2.90 | 13.59 | 2.54 | 13.84 | 2.80 
p. Scab’ 12.40 | 3.24 | 14.00 | 2.52 | 13.20 | 2.58 
otman ec 11.95 | 3.52 | 13.94 | 2.46 | 14.48 | 2.38 
letu 
picture Completion 12.38 | 2.20 | 10.84 | 3.00 | 11.75 | 2.40 
loo, po rrangement 12.34 | 2.14 | 11.00 | 2.82 | 10.30 | 2.09 
Object pign 12.62 | 2.30 | 10.37 | 2.74 | 11.02 | 2.60 
Deding (aly 11.36 | 2.46 | 10.16 | 2.80 | 8.84 | 2.65 
igit Symp SC) or 11.50 | 3.12 | 11.42 | 3.10 | 12.02 | 2.88 
Ymbol (WAIS) 
Em 57 64 64 
Mo, Pan omitted. 
i n - Re 
"v 5 even, except in Preschool (Similarities and Object Assembly M = 56; Vocabulary N = 55 


And in Elementary School (Coding N = 63). 


180 


and Jewish children who were attending par- 
ochial schools. The verbal scores were higher 
than the performance scores in accordance 
with the cultural pressures exerted for ver- 
bal accomplishment by the various subcul- 
tures and their schools. As the writer noted 
in another connection (Levinson, 1958b), 
there is no à priori reason why a person 
with a superior intelligence should have a 
higher verbal than performance ability, 
providing, of course, that both verbal and 
performance items are equally valid tests 
of intelligence. The importance of the 
subculture from which the individual comes 
with its emphasis on verbal skills and the 
premium on verbal accomplishment is 
often overlooked. 

The general implication that we draw 
is that the greater the premium placed 
by a certain subculture on language and 
abstract thinking, the greater will be the 
disparity between verbal and performance 
abilities. 

This raises certain questions with respect 
to the Wechsler tests, and presumably 
other tests. Is it possible, for example, that 
the subtests, which are considered good 
measures (Wechsler, 1944) of important 
functions in the general population, do not 
have the same validity when applied to 
Ss from a highly verbal Jewish traditional 
culture? 


SUMMARY 


A comparative study was made of the 
performance and verbal abilities of pre- 
school children, elementary school pupils, 
and college students of traditional Jewish 
background. The groups were matched for 
full scale IQs on either WISC or WAIS. The 
difference between the verbal and perform- 
ance IQs for the preschool children was at 
the .10 level of confidence. However, on the 
elementary and college levels, differences 
were found which were statistically signifi- 
cant at the .001 level. This is interpreted as 


BORIS M. LEVINSON 


being due to the cumulative effect of Jew- 
ish cultural values emphasizing verbal 
abilities. 


REFERENCES 


Davis, W. A. Social class influences upon 
learning. Cambridge, Mass.: Harvar' 
Univer. Press, 1948. 

Dennis, W. Performance of Near Eastern 
children on the Draw-a-Man test. 
Child Developm., 1957, 28, 427-431. 

Bets, K., Davis, A., Havicuurst, R. J^ 
Herrick, V. E., & Tyrer, R. W. us 
telligence and cultural differences. Chi 
cago: Univer. Chicago Press, 1951. íi 

GannETT, H. E. A developmental theory a 
intelligence. Amer. Psychologist, 1940, 
1, 372-378. E. 

Green, T. C. Individual response to ¢ 1 
tural determinants. J. educ. Socios 
1953, 26, 392-399. E influ- 

Hartson, L. Does college training EDS 
ence test intelligence? J. educ. Psycho? 
1936, 27, 481-491. m. 

Hess, D. O. The organization of behavior 
New York: Wiley, 1949. Ji- 

Levinson, B. M. The intelligence of PY 
cants for admission to Jewish 12 
schools. Jewish soc. Stud., 1957, 19» 

140. e 

Levinson, B. M. Culture and mental ag, 
tardation. Psychol. Rec., 1958, 8» 

(a) e and 

Levinson, B. M. Cultural pressur eb 
WAIS scatter in a traditional A 93, 
setting. J. genet. Psychol, 1958 
277-286. (b) 

Levinson, B. M. The problems 
religious youth. Genet. psycho 
in press. (a) ; 

Levinson, B. M. Subcultural ys dh 
performance and verbal ability Psy- 
elementary school level. J. 9€? 
chol., in press. (b) . - ence 

PraaET, J. The psychology of intelli 9? 
New York: Harcourt, 1947. ores on 

SuvEy, A. M. Improvement in £0 neam 
American Council Psychologies eat’ 
ination from freshman to OUT 
J. educ. Psychol., 1948, 39, 411 7 gey- 

SrnAvss, M. Subcultural vadat 
lonese mental ability: A p 
tional character. J. soc. Psycho? 
39, 129-141. 


of Jewish 
|. MonofT? 


ations 1 


CULTURAL VALUES AND INTELLIGENCE TEST PERFORMANCE 181 


Warxzn, Heren, M., & Lev, J. Statistical for Children. New York: Psychological 
inference. New York: Holt, 1953. Corp., 1949. 

Wzcnsren, D. The measurement of adult in- WECHSLER, D. Manual for the Wechsler Adult 
telligence. (3rd ed.) Baltimore: Williams Intelligence Scale. New York: Psycho- 
& Wilkins, 1944. logical Corp., 1955. 


ECHSLER, D. Wechsler Intelligence Scale Received December 20, 1958. 


MICROFORM 


All journals published by the American Psychological Association 
m being made available on MICROFILM or MICROCARD. 


Psychological Review 
American Psychologist 
Psychological Bulletin 
Psychological Abstracts 
Contemporary Psychology 
Psychological Monographs 
Journal of Applied Psychology 
Journal of Consulting Psychology 
Journal of Educational Psychology 
Journal of Experimental Psychology 
Journal of Abnormal and Social Psychology 
Journal of Comparative and Physiological Psychology 


Available only in volume units; no single issues. 


For MICROFILM, order from: 


UNIVERSITY MICROFILMS, INC. 
313 North First Street 

Ann Arbor 

Michigan 


For MICROCARD, order from: 


J. S. CANNER & COMPANY, INC. 
Microcard Division 

618 Parker Street 

Roxbury 20, Massachusetts 


THE JOURNAL OF 


EDUCATIONAL 


PSYCHOLOGY 


Mv -——————————————————-— 
Eee HmeipO October 1959 Number 5 


COLLEGE PRESS AND STUDENT ACHIEVEMENT! 


DONALD L. THISTLETHWAITE 
National Merit Scholarship Corporation, Evanston, Illinois 


T ont National Merit Scholarship program 
the doa iT opportunity for studying 
number P of intellectual talent. The 
ine in thi igh school students participat- 
iren go nationwide talent search 
Nen: 58,000 in 1956 to almost a 
Well over an in 1959. It is estimated that 
i "est [^ of all the high school seniors 
gards b 0% of the population as re- 
1959 a ectual ability participated in the 
Was to s 2s One of the aims of this study 
Students 5b at could be learned from these 
“onducive ig the kinds of environments 

ntialities, the realization of their po- 
Press, analysis related the environmental 
the Col, ifferent colleges, as measured by 
Stern, “ge Characteristics Index (Pace & 
ac leverno, 8), to measures of student 
Was the ut. The criterion of achievement 
Who j, c Pe'centage of the college’s alumni 
long eee doctorates. In the educa- 
atly Els this criterion has fre- 
ü Colleges; uterpreted as a measure of 
*h, 1955; Productivity (Knapp & Good- 

>“napp & Greenbaum, 1953). 


aseq u 
of the Pon a paper presented at meet- 


` l 
Megs 
Ane 
959. This research was 
th by the National Science 
^ os B Did Dominion Founda- 
assi and 8 indebted to John L. 
ing ML an Laura, Kent for their editorial 
Vailable to Lindsey Harmon for mak- 
Prepublication tabulations of 


ba 
patea] 
n the Aurea, 
eate 
8 Uniteq origins of doctorates awarded 
d States 


su 
tion dation ported 
lang © auth 


183 


Several investigators (Stuit, Helmstad- 
ter, & Fredericksen, 1956; Holland, 1957) 
point to one difficulty in interpreting the 
Knapp-Goodrich and the Knapp-Green- 
baum results: since no adjustments were 
made for the fact that some colleges get a 
higher proportion of talented students than 
others, we do not know whether the alumni 
of colleges rated high on these indexes ex- 
hibit more achievement because of their 
undergraduate training or because of their 
initial superiority in aptitude. The present 
study attempts to resolve this ambiguity 
by making adjustments in order to equate 
colleges with respect to student quality. 


DEVELOPMENT OF CRITERIA 
FoR ACHIEVEMENT 


Because of the reluctance of many col- 
leges to release data on the aptitude of 
their entering classes, it has been difficult 
to make the required adjustments. Our ap- 
proach was to develop an approximate in- 
dex of student quality and to validate it 
for those colleges on which we could obtain 
aptitude data. If the index is valid for these 
colleges it may be used to describe other 
colleges as well. The first two National 
Merit programs provided records of the 
college enrollments of over 9,600 talented 
students. For each of 511 colleges we cal- 
culated the percentage of the freshmen class 
who were Merit Scholars or Certificate of 
Merit winners in the Merit program during 
the preceding spring. This percentage for 


184 


1956 freshman enrollments is referred to as 
the 1956 Talent Supply Index. For 39 
men's colleges, the correlation between this 
index and the mean Scholastic Aptitude 
Test score of entering 1956 classes was .74, 
and for 43 women's colleges the correlation 
was .76. These estimates of validity are 
probably conservative, since these colleges 
were all College Board colleges and so 
tended to enroll superior students. The 
Talent Supply Index probably has a 
validity of at least .80 for the wider range 
of talent. To increase the reliability of this 
measure of student quality, the two talent 
supply indexes for the years 1956 and 1957 
were added? 

If it is assumed that the calibre of the 
college’s student body remains relatively 
constant over a period of years, we may use 
our index to estimate student quality for 
the period during which Knapp’s alumni 
groups were in college. By correlating the 
composite Talent Supply Index with the 
Knapp-Goodrich and Knapp-Greenbaum 
productivity indexes, it is possible to esti- 
mate the magnitude of the error introduced 
by ignoring diversities in student quality. 
The Knapp-Goodrich index of science pro- 
ductivity correlates .38 with the Talent 
Supply Index, a figure very close to the 
correlation of .39 reported by these authors 
between their index and ACE aptitude 
scores for a small sample of 50 colleges. The 
Knapp-Greenbaum indexes correlate .71 
and .64 with our Talent Supply Index. 
Thus, variations in student quality appear 
to account for 40 to 5095 of the variance 
in the Knapp-Greenbaum indexes, and for 
approximately 15% of the variance in the 
Knapp-Goodrich measure. It seems impera- 
tive to control diversities in student quality 


2 The correlation between the two indexes 
for the 511 colleges was .92, indicating a fair 
degree of stability in talent supply. The 
correlation between mean ACE scores of 
entering freshmen at 19 Minnesota institu- 
tions of higher learning, over the seven-year 

eriod 1947-1954, was .80 (data from per- 
sonal communiction, W. R. Layton, Uni- 
versity of Minnesota). 


DONALD L. THISTLETHWAITE 


if we intend to use college productivity 
rates as measures of educational effective- 
ness. 

By assuming stability in college talent 
supplies over the years, we may partial out 
the effects of diversities in student quality. 
Our productivity index for the natural 
sciences is defined as the discrepancy be- 
tween a college’s expected rate of producing 
natural science Ph.D.s (as predicted from 
its enrollment of talented students) and its 
actual rate of productivity. Given the cot 
relation table relating the college’s talent 
supply (X) and the percentage of the 
college’s graduates who earn doctorates 
(Y), the residuals in predicting Y from P 
give an index of productivity independen’ 
of student quality. Productivity indexe? 
were then computed for the 511 colleges; 
these colleges enrolled about 70% of = 
freshmen entering degree-granting college 
in 1956 and 1957. dto 

Since this analysis suggested the nee " 
treat scholarly and scientific fields ped 
arately, we have developed two gras 
tivity measures—one for the na us 
sciences and one for the arts, humaniti® 
and social sciences. For convenience; | 
former is referred to as the NS, and e 
latter as the AHSS, index. The records n. 
doctorates granted which we used W my 
those published by the National Acade for 
of Sciences, National Research Council, a 
the period 1950-56. A more complete in 
scription of sources of information Use. s 
calculating the productivity indexes 15 0). 
in a previous report (Thistlethwaite, 1 ace 
Since the median lapse between the M 
calaureate and the science doctoral deg ith 
is about seven years, we are dealing og 
alumni who graduated from colleg® xtra 
the period 1943-49. Thus we 81° c eaf 
polating backwards over a 10- to 1 
period. 

Because of these extrapolations, i 
dexes lack sufficient reliability t Pcr 
comparisons of individual college: nat 
ever, severa] lines of evidence suger rou? 
the indexes are sufficiently valid fO 


y 
the " 


COLLEGE PRESS AND STUDENT ACHIEVEMENT 


apr de First, according to students’ 
doen there is a clear difference between 
me s and unproductive colleges in 
Sind yo hasis upon preparing for graduate 
Vut eme the measures seem to be 
Profe o differences in college objectives. 
Hae onal and technical schools rank 
ine A NE RUBMAEIH and lowest in 
Sidi oductivity. Similarly, of students 
cen hai n ranking high in AHSS 
| i Y, 96% report that the college 
a S exceptionally well equipped with 
S ty Periodicals, and books in the 
colleges oo Of students attending 
only oe e. low in AHSS productivity, 
m on 0% endorse this statement. 
ps Pu : eges located in the South tend 
‘Ow on both measures, a finding 


Which is h 
co = 
Bun ms i with the Knapp-Green- 


Mz 
ASURES OF COLLEGE PRESS 


To i . 
c a student cultures and faculty 
Se ue Acs which motivate students to 
eges var octorate, Student ratings of col- 
Pareq ying in productivity were com- 
the Sine this analysis, it is assumed that 
the Colle Onmental demands or pressures at 
Eteat]y a es in our sample have not varied 
this pe eun the past 10-15 years. With 
*PPraisals Se, it is possible to use current 
i of the college environment as an 


estima 
te 
the of the college atmosphere during 


; * Perio di 
In whi : 
i college, Which our alumni groups were 


he © 
© Stem, 1982 Characteristics Index (Pace 
oe National es administered to 916 of 
st, erit, sn erit Scholars and Certificate 
of E: s, S iain at 36 colleges. These 
ĉa he Sürye Were sophomores at the time 
th or the 2 » Were asked to judge whether 
true 0 statements in the CCI was 
- The probably false about their 
ollege was, erage number of observers 
ter Can ha s 25, and these observer 
the ative of the ly be thought of as repre- 
Come Bom e student body, since 
lis ei exceptionally talented 
er hand, if a dominant 


185 


press really exists in a particular college 
almost any group of students attending 
that college will probably recognize it. 
Fortunately, we had a diverse group of 
colleges. The 36 colleges included: Amherst, 
Brown (including Pembroke), Gilifornia 
Institute of Technology, Carleton, Carne- 
gie, U. of Chicago, U. of Colorado, Cornell 
Dartmouth, Duke, Georgia Institute of 
Technology, Harvard, Indiana, Iowa State, 
U. of Kansas, Massachusetts Institute of 
Technology, U. of Michigan, U. of Min- 
nesota, Northwestern, U. of Notre Dame, 
Oberlin, U. of Pennsylvania, Pomona, 
Princeton, Purdue, Radcliffe, Rensselaer 
Polytechnic Institute, Rice, Smith, Stan- 
ford, Swarthmore, U. of Texas, U. of 
Wisconsin, Wellesley, Wesleyan, and Yale. 


RESULTS 


The student reports provide abundant 
evidence that college press differ con- 
siderably. Equally important, the press are 
consistent with our expectations. For ex- 
ample, Harvard and Radcliffe had the 
highest median scores on Humanism; MIT 
the highest on Scientism; Georgia Tech. 
and Rensselaer the highest on Pragmatism; 
Smith College the highest on Nurturance; 
and the University of Chicago the highest 
on Understanding. Clearly the CCI reflects 
differences in college atmospheres which are 
consistent with common belief. 

The correlations between student 
achievement and some of the variables de- 
scriptive of college environments were re- 
markably high in view of the assumptions 
of the study. The college's median score on 
each CCI seale was correlated with each 
of the productivity indexes. Table 1 sum- 
marizes the results for all scales which 
exhibited a correlation significant at the .01 
level with at least one of the productivity 
measures. Note that 12 of the 30 scales 


t this criterion. The most discriminat- 


mee: 2 
se 12 scales in shown 


ing item in each of the 
in Table 1, together with the response 
which was weighted positively. The most 
striking feature of these results is that one 


186 


DONALD L. THISTLETHWAITE 


TABLE 1 


CORRELATIONS BETWEEN COLLEGE CHARACTERISTICS INDEX SCALES 
AND PRODUCTIVITY AT 36 COLLEGES 


Correlation With 
Productivity In 
CCI Scale Most Discriminating Item on Scale Pe 
NS AHSS 
Humanism Few students are planning postgraduate gi* 
work in the social sciences. (F) —.23 . 
Pragmatism Students are more interested in specializa- T 
tion than in liberal education. (T) 15 47 
Reflectiveness Modern art and music get little attention g** 
here. (F) —.20 6 
Sentience Student rooms are more likely to be deco- 
rated with pennants and pin-ups than 
with paintings, carvings, mobiles, fabrics, gt 
ete. (F) —.35* 6 
Harmavoidance Fire drills are held in student dormitories 1“ 
and residences. (T) —.23 5 
Deference Religious worship here stresses service to 50** 
God and obedience to His laws. (T) —.88* |—: 
Abasement There is a lot of apple-polishing around 47** 
, here. (T) —.11 cm 
Understanding There is a lot of emphasis on preparing for 43** 
M: graduate work. (T) 18 3 
Scientism Few students are planning careers in 3 
E science. (F) 59** | —.0 
Aggression-Blame- Students ask permission before deviating 24 
avoidance — . from common policies or practices. (F) 56** | — 
Impulsion-Deliberation | Students frequently do things on the spur of 3 
the moment. (T ,A8* 0 
Order Professors usually take attendance in class. 18 
(T) —.43** | =e 


P= Ol 


type of college environment is associated 
with achievement in the natural sciences, 
while a different kind of environment is 
related to accomplishment in the arts, 
humanities, and social sciences. Produc- 
tivity in the humanities is positively related 
to Humanism, Reflectiveness, Sentience, 
Harmavoidance, and Understanding. It is 
negatively related to Pragmatism, De- 
ference, and Abasement. Productivity in 
the natural sciences is positively related to 
Scientism, Aggression, and Impulsion, and 
negatively related to Order, Deference, and 
Sentience. 

Although there are obvious differences 
between the CCI scales which predict 


e 
achievement in the two broad fields, i 
implications for college teachers 8n al- 
ministrators are not clear from this ? the 
ysis. Part of the ambiguity arises i 
fact that most of the CCI scales Ei an 
posites of items descriptive of students, 
faculty. In Table 1, for example; the ge 
taken from the Scientism ale 
students are planning careers in 801607" yg 
pertains to student behavior, W! der 
item quoted from the scale called s in 
“Professors usually take attenda? p50 
class"—deseribed faeulty behavior, wel 
“composite” nature of these scale? jte 
illustrated by Humanism—it has fiv° 5 an 
describing student behavior and valu 


COLLEGE PRESS AND STUDENT ACHIEVEMENT 


five items describing the faculty or ad- 
ministration. From these correlations we 
cannot tell whether faculty influences or 
student culture influences, or both, are re- 
lated to productivity. 

Therefore, another analysis was made 
based upon revised scales containing more 
homogeneous items. That is, a set of 
student press scales were devised from 
items in the CCI so that each scale in- 
cluded only those items descriptive of 
Student values, interests, or behaviors. 
Similarly, we constructed a group of faculty 
Press scales which included only those items 
descriptive of the college faculty or ad- 
ministration, Items which seemed to de- 
i the same trait and which exhibited 
: * same pattern of correlations with the 
Wo productivity measures were grouped 
Sel Since items which showed no re- 

ion to the achievement criteria were 


187 


discarded, the correlations to be reported 
should be interpreted with caution. In other 
words, these revised scales need to be cross- 
validated on a new sample of colleges. 
The correlations for the revised student 
press scales are given in Table 2. These 
correlations show the student attitudes, 
interests, and peer group norms which are 
related to achievement. In general, this 
analysis is consistent with the hypothesis 
that scientific and scholarly achievers 
thrive in different types of environments. 
Student cultures characterized by Hu- 
manism, Breadth of Interests, and Reflec- 
tiveness are associated with scholarly 
productivity, whereas cultures character- 
ized by Participation and Aggression are 
negatively related to scholarly produc- 
tivity. Motivation to seek the Ph.D. in the 
natural sciences, on the other hand, seems 
to be stimulated by student cultures which 


TABLE 2 


CORRELATION BETWEEN REVISED STUDENT PRESS SCALES 
AND PRODUCTIVITY AT 36 COLLEGES 


Correlation va 
i Productivity 
Stud Most Representative Item | 
ent Press Scale (Highest item-total score correlation) 
NS AHSS 
H ; = 
"manism There is a lot of interest here in poetry, 
musie, painting, sculpture, architecture, 5 
Breadth of ; ete. (T) Mp MER n 
h of interests Most students have very little interest in 
round tables, panel meetings, or other ares 
Reflect; formal discussions. (F) . —.27 d 
Iveness There would be a capacity audience for a 
lecture by an outstanding philosopher or "T 
Partei. theologian. (T) ME NM 
patio Student pep rallies, parades, dances, carni- 
vals, or demonstrations occur very rarely. ʻa 
Apres: q» ; Q0 |-.53 
i nn Hazing, teasing, and practical joking are mum 
lentis fairly common. (T) x x 
S is When students get together they seldom — "s 
cia] conte talk about science. (F) . k . : 
nformity Students think about dressing appropri- 
ately and interestingly for different oan 
sions—classes, social events, sports, an E E 
* other affairs. (T) E 
rom 
Pw 


188 


are high in Scientism and Aggression and 
inhibited by those which stress Social Con- 
formity. It could be, of course, that initial 
differences in student attitudes are partly 
reflected in these correlations. However, it 
seems reasonable to assume that different 
student cultures have considerable effects 
upon student achievement by virtue of the 
kinds of behavior they sanction. 

An equally important part of the college 
press consists of faculty practices and ad- 
ministrative policy. Table 3 shows some of 
the effects of these stimuli upon student 
achievement. The analysis confirms once 
again the view that one type of college press 
stimulates achievement in the natural 
Sciences, while a different type facilitates 
achievement in the arts, humanities, and 
social sciences, Colleges outstandingly suc- 
cessful in encouraging undergraduates to 
get the doctorate in humanistic fields are 
characterized by (a) excellent social science 
faculty and resources, (b) a flexible, or 
Somewhat unstructured, curriculum, (c) 
energy and controversiality of instruction, 
and (d) informality and warmth of student- 


DONALD L. THISTLETHWAITE 


faculty contacts. At colleges high in -— 
science productivity, too, the faculty tends 
to have contacts with students character- 
ized by informality and warmth, but here 
the similarity ends. The latter are char- 
acterized by the absence of outstanding 
social science faculties or resources. The 
teachers tend to be nondirective in is 
teaching methods: for example, studen é 
find it relatively hard to predict perme 
tion questions and to take clear ni B 
class; instructors less frequently outlin 
explicit goals and purposes for courses; er 
students are not required to submit OU i 
lines before writing term papers and T 
ports. Finally, the Closeness of mue a 
scale suggests that the faculty does not P s: 
the role of Big Brother: students need a 
sit in assigned seats and attendance 15 pi 
taken; student organizations are not nt 
supervised to guard against en 
faculty members are tolerant and un jd 
standing in dealing with violations of dae 
The scale called Informity and War E 
of Student-Faculty Contacts is of Spe em 
interest since it seems to predict achie 


TABLE 3 
ConnELATIONS BETWEEN Revisep Facunry Press SCALES 
AND PRODUCTIVITY AT 36 COLLEGES 


Correlation WI 
Faculty Press Scale : Most Representative Item Productiv: 
(Highest item-total score correlation) AHSS 
NS DE eom 
Excellence of social science | Course offerings and faculty in the so- 83** 
faculty and resources cial sciences are outstanding. (T) —.42* Á 
Flexibility of curriculum If a student fails a course he can usu- 
ally substitute another one for it 68** 
r rather than take it over. (T) —.31 ' 
Energy and controversial- | Class discussions are typically vigorous 58** 
ity of instruction and intense. (T) — 18 : 
Informality and warmth of | Faeulty members and administrators 
student-faculty contacts see students only during scheduled 40* 
office hours or by appointment. (F) 43" ` 
Closeness of supervision Professors usually take attendance in —.2 
class. (T) —.38* 
Directiveness of teaching | Instructors clearly explain the goals p 
methods and purposes of their courses. (T) — Ag? 
* P = 05 


COLLEGE PRESS AND STUDENT ACHIEVEMENT 


il in both areas. The most representa- 
ive items of this scale tell us something 
about the behavior of the teacher mio 
ete graduate study: he does not see 
udents only during office hours or by ap- 
crei. open displays of emotion are 
2 « ely to embarrass him; students need 
in EE to be called upon before speaking 
ene in talking with students he fre- 
e E refers to his colleagues by their 
ada n students do not fell obliged to 
"s E him as “professor” or “doctor.” In 
dilate edi (c-r an teacher is con- 
D iac ot encourage deference 
im ent in his students. 
boon — aa ee — college 
anono e y. Though the num- 
not à meas. ` u ent spends in study is 
Benstall Sas eo achievement, one would 
Sena expect to find more study in 
of à mp: dus environments. As a part 
ertificate q Pa we asked some 1900 
un : sli Merit winners attending 35 
estimate ie in previously mentioned to 
Week they s average number of hours per 
Toom, An a pent in study outside the class- 
there sik pec of variance indicates that 
Tenn hey di significant differences in 
3 AE ua study at these 35 colleges. 
a e lons between the press scales 
vis ean hours of study at each of 
Nur ges are shown in Table 4. 
of Study ral, the average number of hours 
lateg to am college's students was re- 
to Ng oan SS productivity but unrelated 
+48 i Uctivity. The correlations were 
SYstemaş;,, 01 respectively. It seems that 
toom tie study habit i 
a T (an fion its outside the class- 
k Story) mil haps also outside the lab- 
p aniti Ke for achievement in the 
ethan ^8 but not in the natural sci 
A DS the two : atural sciences. 
s. Xt types road fields require dif- 
enti 9f study: the promising young 
“Ovex Needs his labor a i 
ies, While ü oratory to make dis- 
is i li Tory i he scholar depends more 
Striking à “sources. In any case, there 
SS Corre ud between the college 
"dy mea 9f the AHSS and the hours 
Sures. Students study more 


[M 


189 


TABLE 4 


CORRELATIONS BETWEEN REVISED Press 
Scares AND Hours or STUDY 


Correlation 
Colles With Mean 
ress Hours of 
Study 
I. Student Culture 
Reflectiveness .54** 
Humanism AT** 
Breadth of Interests .43** 
Participation —.42* 
Aggression .24 
Scientism .20 
Social Conformity =a: 
II. Faculty Press 
Flexibility of Curriculum yyer 
Energy and Controversiality 
of Instruction .56** 
Emphasis Upon High Aca- 
demie Standards abot 
Excellence of Social Science 
Faculty and Resources .42* 
Informality and Warmth of 
Student-Faculty Contacts .38* 
Closeness of Supervision —.82 
Directiveness of Teaching 
Methods .29 
*P=.05. 
** P — 01. 


outside the classroom, and are stimulated 
to obtain doctorates in the humanities and 
social sciences, primarily when the domi- 
nant student culture is characterized by 
Reflectiveness, Humanism, Breadth of In- 
terests, and by relatively little Participa- 
tion. Similarly faculty press which stimu- 
late graduate study in the humanities and 
social sciences also tend to encourage more 
study outside the classroom. Emphasis 
upon high academic standards is signifi- 
cantly related to hours of study, as it should 
be, although this press did not differentiate 
between colleges which are high and low in 


productivity. 


Discussion 
cive to intellectual 
ts of great theo- 
they help to 
ducation. Such 


College press condu 
achievement are concep 
retical importance, since 
organize research on higher e 


190 


conceptulizations might suggest methods 
for improving higher education. The anal- 
ysis indieates that college faculties and 
student cultures play important roles in 
motivating undergraduates to seek ad- 
vanced training. The college press which 
encourage the scientist differ from those 
which inspire the scholar. There are, to be 
sure, some college characteristics—par- 
ticularly informal and friendly contacts 
between faculty and students—associated 
with achievement in both the natural 
Sciences and humanities, but the differences 
are more striking than the similarities. 
Particularly noteworthy are those press 
scales which correlate positively with one 
productivity index and negatively with the 
other. 

Some of these correlations indicate a 
Specialization of function in American col- 
leges. For example, the CCI scales called 
Humanism and Scientism, the revised 
Student press scales having the same desig- 
nations, and the rating of the excellence of 
social science faculty and resources un- 
doubtedly reflect diversities in educational 
objectives. In addition, it is possible to 
discern an ethos of the college high in 
natural science productivity distinct from 
that of the college excelling in the produc- 
tion of potential doctorates in the humani- 
ties. The environment productive of natural 
scientists is characterized by student ag- 
gression, nonconformity, and commitment 
to science; the faculty tends to be nondirec- 
tive in teaching methods though adhering 
to strict curricular requirements. One 
student at a technological institute captures 
some of this quality in his comment: “The 
school tries very hard to flunk you out,” 
On the other hand, colleges high in AHSS 
productivity are characterized more fre- 

quently by policies which challenge the 
student without threatening him—this 
orientation is suggested by the CCI state- 
ment, “If a student fails a course he can 
usually substitute another one for it rather 
than take it over.” Similarly, energy and 


DONALD L. THISTLETHWAITE 


enthusiasm seem more typical of teachers . 
in these schools, a characteristic which 
other investigators have found to be related 
to student evaluations of teachers (French, 
1958). Students at colleges high in AHSS 
productivity, unlike their NS counterparts, 
are characterized by breadth of interests, 
greater reflectiveness, limited pacticipatiu 
in campus antics, and limited expression 0. 
aggression toward faculty and fellow stu- 
dents. " 

The present analysis clearly needs to 
extended in many directions before we o 
explain why some educational environ- 
ments are more effective than others. Es 
pecially promising from the standpoint D 
establishing causal relations are oneg 
tudinal studies which follow reci 
matched for initial aptitude and per. 
motivations through the undergradus 
and postcollege years, Such studies are i j 
being initiated among intellectually s 
ented students in the National Me 
Scholarship program. 


SUMMARY 


The present report suggests that pr 
college environment is an important 
terminant of the student's pudo 
Seek advanced intellectual training. ress 
over, the student cultures and faculty o 
which stimulate achievement in the n hic 
sciences appear to differ from those W ani- 
stimulate achievement in the arts, buras 
ties, and social sciences. New dn 
for assessing student cultures and E 
behaviors are described, and it is es a 
that these dimensions have proms’, cf 
predictors of student achievement per 
the amount of study of the student 9 
the classroom, 


REFERENCES "TE 
FnzNcH, Grace M. College studen , Fae 
cept of effective teaching- t) 

chologist, 1958, 13, 378. (Absttat ng o 
Houtanp, J. L. Undergraduate OnE, 326 


à 57 
American scientists. Science, 1995 
433-437. 


d —— ——————————— a ————Á —ÉÁÉE——ÉÉE 
———X—————— X y ———,óÓ" 


COLLEGE PRESS AND STUDENT ACHIEVEMENT 191 


Knarp, R. H., & Goonnrcu, H. B. Origins 
of American scientists. Chicago: Univer. 
Chicago Press, 1952. 

Kyarp, R. H., & Greensaum, J. J. The 
younger American scholar: His collegiate 
origins. Chicago: Univer. Chicago 
Press, 1953. 

Pace, C. R., & Svern, G. G. College charac- 
teristics index, Form 458. Syracuse, 

P NY Authors, 1958. 

ACE, C. R., & Stern, G. G. An approach 
to the measurement of psychological 


characteristics of college environments. 
J. educ. Psychol., 1958, 49, 269-277. 

Sirur, D. B., HELwsrAprER, G. C., & 
FREDERICKSEN, N. Survey of college 
evaluation methods and needs. Princeton, 
N. J.: Educational Testing Service, 
1956. 

THISTLETHWAITE, D. L. College environ- 
ment and the development of talent. 
Science, 1959, 130, 71-76. 


Received April 17, 1959. 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 5, 1959 


ABNORMAL PSYCHOLOGY AS A SELECTIVE FACTOR: 
A CONFIRMATION AND EXTENSION 


LEON M. WISE 
Heidelberg College 


There have often been allusions made by 
college students and others to the effect 
that students who register for a course in 
abnormal psychology are, or tend to be, 
more abnormal, neurotic, or poorly adjusted 
than one might expect to find by chance 
in the average classroom. One study (Mills, 
1955), which has already shed some light 
on the problem, compared students in an 
abnormal psychology course with students 
in a European history course. The general 
findings indicated a statistically significant 
difference between the two courses using 
the Munroe inspection technique. The 
author implied that psychology acts se- 
lectively on students in that, frequently, 
poorly adjusted students are attracted to 
psychology courses. 

It came to the present author’s attention 
that it might not be psychology, per se, 
which acts as a selective factor, but specific 
course content. That is, if the course con- 
tent deals mainly with the bizarre aspects 
of human behavior, this would tend to 
attract students in emotional conflict more 
readily than course content dealing with the 
more unemotional aspects of human be- 
havior. For example, it was felt that if 
psychology courses were to be compared 
with respect to morbidity of content and 
the emotional stability of their respective 
students, a relationship would be found to 
exist. In courses which might be termed 
nonpsychological in nature, emotional con- 

flict of students should be minimal, while 
in psychology courses, and especially ab- 
normal psychology, it should be maximal. 
The assumption here is that amount of 
morbid subject matter varies among the 
different psychology courses. 


PROCEDURE 


The present experiment was conducted 
in a small midwestern coeducation 10- 
stitution, population 695. The Cornell In- 
dex, Form N-2, was selected as the instru- 
ment to measure emotional conflict or poor 
adjustment. This instrument measures the 
number of neuropsychiatric and psychoso- 
matic symptoms and is objectively score i 
It was decided to compare students in fou 
college courses: a communications Gus 
N = 46; a general psychology course, N 
56; a child psychology course, N = 33; r^ 
an abnormal psychology course, Ne = fs 
These courses had approximately the var i 
proportion of males and females. This Mee 
important owing to the fact that the Ma 
for the Cornell Index show differences the 
tween the sexes. For the most part, D: 
communications course consisted of ma 
men, and the general psychology pees 
consisted of sophomores. The abnormal ee 
child psychology courses consisted MO e 
of sophomores, but did include a 
juniors and seniors. In addition, the pene o! 
psychology course was a prerequisite... 
both child and abnormal psychology- I re 
ever, child psychology was not ? bay 
requisite for the abnormal psycho 
course. " d o? 

The Cornell Index was administer’ the 
the first class day of cach course an Jimi- 
results compared. This was done tO pang 
nate the possible biasing effects af 
exposed to course subject matter- 


Resvurs anp Discuss!ON 


y 

Analysis of variance yielded ? hm 
tically significant F ratio of 3.05 (P Mills 
thus confirming the results of the 
Study (1955). 


192 


PSYCHOLOGY AS A SELECTIVE FACTOR 193 


In addition to the statistically significant 
F ratio, however, the data, as shown in 
Table 1, lend support to the present 
author's hypothesis stated previously. Ob- 
servation of the means shows the following 
trend, As the presumed amount of mor- 
bidity increased in courses, mean scores on 
the Cornell Index showed an increase. How- 
ever, only a small difference between the 
means of general and child psychology is 
evidenced. This might be an indication 
that there is relatively little difference in 
the amount of morbid or bizarre subject 
matter in the two courses. Based on this 
Praca t tests were calculated in- 
ividually for all comparisons. 
"ba results as presented in Table 2 show 
is i only statistically significant ¢ was 
in e communications—abnormal psy- 
differ’, comparison. However, since the 
a between general and child psy- 
Se pl was slight, these groups were com- 
dea Li t tests calculated. The £ value for 
ae erence between means of the com- 
iene and abnormal psychology was 
sn elly significant (p < .05). The £ 
end Or difference between means of the 
ined groups and communications was 
in. p Significance at the .05 level, a £ 
robats 1.97 is required. Thus, the latter 
i ability was <.06 but >.05. This is 
eation aa as evidence to support the con- 
ba — abnormal psychology students 
child - piste fontedd than general and 
Bener ye hology students. Furthermore, 
br al and child psychol tudent 
Obably mo psychology students are 
Cations re maladjusted than communi- 
Students, 


TABLE 1 
ut STANDARD DEVIATIONS 
FERENT COLLEGE COURSES 


Mn, 
ANS A 
Or D ie 


Course N | X | SD 

a 
Nera] Cations 46 | 7.54) 6.42 
x | Psychology 56 | 9.79| 7.79 
bnorma colo 33 |10.09| 5.42 
Psychology | 36 |12.92| 7.37 


TABLE 2 


t VALUES FOR MEANS or DIFFERENT 
COLLEGE Courses 


Courses df |t Values} ? Values 
Communications vs. 
General psychol. 100 | 1.55 | 2.05 
Communications vs. 
Child psychol. 77 | 1.84 | >.05 
Communications vs. 
Abnormal psychol. 80 | 4.75 | «.01 
General psychol. vs. 
Child psychol. 88| .20| >.05 
General psychol. vs. 
Abnormal psychol. 90 | 1.93 | >.05 
Child psychol. vs. Ab- 
normal psychol. 67 | 1.92 | >.05 
Gen. & Child (com- 
bined) vs. Commu- 
nications 133 | 1.96 | 2.05 
Gen. & Child (com- 
bined) vs. Abnormal 
psychol. 123 | 2.23 | <.05 


"These findings support the author's hy- 
pothesis and seem to indicate that psy- 
chology, per se, is not the most important 
selective factor in attracting poorly ad- 
justed students. Instead, the findings show 
that the study of bizarre or deviant be- 
havior is the more likely culprit. 

If it is permissible to generalize to other 
similar abnormal psychology courses in 
other institutions of higher learning, it 
might be well to reflect on the best method 
of teaching such a course. This has already 
been touched upon elsewhere (Mills, 1955). 
It is quite possible that students registering 
for such a course are not only interested in 
the subject matter as subject matter, or a 
grade, as may well be the case in other 
academic areas, but are also interested in 
getting some help and/or insight into the 
nature of their own personal problems or 
the problems of others. The present author, 
after interviewing students in an abnormal 
psychology course, found this possibility 
more of a reality suggesting that considera- 
tion should be given to the possibility of 
orienting the course, at least in part, in the 


194 


general direction of a therapeutic-type 
teaching situation. That is, a secondary ob- 
jective might be added to the course in 
addition to the usual academic objective. 

Autobiography, as suggested by Brower 
(1947), offers the possibility of aiding in 
this respect. A group therapy approach 
offers still another possibility. Certainly re- 
assurance might be consciously given from 
time to time. In addition, alertness might 
be maintained for “loaded” questions. Re- 
gardless of method utilized, and it would 
undoubtedly vary considerably depending 
on circumstances, consideration should be 
given to the fact that students in abnormal 
psychology apparently desire something 
more than is customary in other academic 
areas, Additional research is needed to shed 
still more light on this problem. 

The usual practice of requiring psy- 
chology majors to take abnormal psy- 
chology raises an additional question. Are 
psychology majors more poorly adjusted 
than nonpsychology majors? The present 
data do not permit a conclusion one way 


LEON M. WISE 


or the other. Additional study is necessary 
in order to shed light on this question. 


SUMMARY 


The Cornell Index was administered o 
four college courses varying in amount E 
morbid subject matter. They were: & non 
psychology course, a general jer. 
course, a child psychology course, an i 
abnormal psychology course. A statistica ed 
significant F ratio was found (p < 01). al 
addition, a relationship between age. 
adjustment scores of students and lare 
bidity of course subject matter was pom is 
out. It was concluded that psychology, P 
se, was not as important a selective fac 
as was specific course content. 


REFERENCES ; 
Bnowzn, D. The use of an autobiograP M 
in a course in abnormal psychology: 
genet. Psychol., 1947, 71, 253-257- a20- 
Mitts, E. S. Abnormal psychology 9 dy 
lective factor in the college curries 
J. educ. Psychol., 1955, 46, 101-111. 


Received January 14, 1959. 


JounNAL or Epvca' 
TIONAL Ps 
Vol. 50, No. 5, 1959 YcHOLOGY 


THE RELATIONSHIP OF AUTHORITARIAN 


PERSONALITY 


TO LEARNING: 


F SCALE SCORES COMBINED TO CLASSROOM PERFORMANCE 


ANN FILINGER NEEL 
Wyandotte County Guidance Clinic, Kansas City, Kansas 


as laboratories in which learning 
cena are put to test need not concern 
is e ves with the personalities of their 
B M subjects. The subject, rat, 
ee or guinea pig, usually will press 
with d vo keys, or pull levers or salivate 
Sovu 8 lar as we know, no prejudice or 
eme about the bar or lever he 
ciel the bell or food pellet to which he 
xd S. But human learning becomes 
Mn complex because the kind of 
to the e Subject is influences his reaction 
i pee he is to learn, and the way 
also inj goes about learning it. It will 
vas ee Mas he learns, or at least 
est, In qp 1 portormanie he will mani- 
which 5 act, it is this differential reaction 
alit pecifies and reinforces the person- 
Y of the subject. 
the in iei paper explores one facet of 
the Sion Traction between personality and 
alters w. by which exposure to new data 
such alte avior, While the appearance of 
suffe rations will ordinarily be regarded 
not alto = evidence for learning, one can 
ditiis er rule out the possibility that 
other ^y variations in performance due 
later in eM This will be brought out 

* Iscussing some of the findings. 

i mu of personality selected for 
Miri was the so-called authoritarian 
E y (Adorno, Frenkel-Brunswick, 
» & Sanford, 1949). In addition to 
Sion, sed over dominance and submis- 
YPothesiz, are certain other characteristics 
ality tyne ed to be present in such a person- 
Unable to ea a person is presumed to be 
im lerate ambiguity, preferring to 

Tesen: 


Person, 
Levin 
the a 


Cal ted at th " : 
t 88001944 e American Psychologi- 
mber t Washington, D. C., Sep- 


deal with well defined and well ordered 
material, and prone to imposing such order 
upon the world of his own accord if it does 
not exist in its own right. Once he has 
organized his perceptions or beliefs he is 
slow to change if he can do so at all. He 
responds to people as to any other stimulus, 
fitting them into moral categories rather 
than appreciating individual variations. 
He can not understand or empathize with 
others, and more than likely can not even 
like them. 

How might such a personality affect the 
learning process? In the first place, such a 
person, with his resentment and antago- 
nisms for others, should find it difficult to 
learn to achieve certain types of knowledge 
about and understanding of human be- 
havior. A person who can not understand 
or tolerate others would find it hard to 
achieve the attitudes of humanitarian in- 
structors. Learning in this area would be 
complicated by the lack of definitive struc- 
ture and established fact. Aside from 
difficulties with the material itself, the 
authoritarian person would probably find 
i& uncomfortable and perhaps distasteful 
to be exposed to this type of content. Simi- 
lar complications would not be expected 
with more factual learning. 

These considerations generate the follow- 
ing hypotheses: 

1. The more 
the more likely 
learning material w. 
humanitarian philoso. 
ous. Such difficulty sh 
where learning of fac 


volved. re 4 
oritarian & person 18, 


2. The more auth 
the more dislike he should manifest for 


authoritarian & person is, 

he is to have difficulty 
hich (a) deals with 
phy or (b) is ambigu- 
ould not be evidenced 
tual material is in- 


195 


196 


materials involving ambiguous or humani- 
tarian materials. 


SUBJECTS AND PROCEDURE 


Subjects. The persons employed as Ss were 
30 male senior medical students taking a 
required class in psychiatry. This class 
included experience in evaluating and treat- 
ing persons with emotional problems. The 
emphasis was placed on the type of prob- 
lems met with in general medical practice, 
not on psychiatry per se. In addition to 
patient contacts, there were numerous 
seminars and discussions regarding therapy 
and diagnosis. 

Assessment of authoritarian personality. 
The group of students was given the F 
scale (Adorno, Frenkel-Brunswick, Levin- 
son, & Sanford, 1949) during one of their 
class periods. They were told this was a 
study by psychologists investigating social 
attitudes among medical students with 
reference to the kind of speciality 
intended to enter, They were asked to put 
their names on the attitude scales so that 
their scores could be correlated with their 
eventual choice of fields, 


Assessment of learning. During the course, 
the students were given several examina- 
tions. One of these was considered to be a 
test of factual information, although it is 
not a pure culture test thereof since it in- 
volved the necessity of translating facts 
into behavior. The test is reproduced here: 


they 


Your patient is a young married woman 
who is very hostile toward her husband but 
can not accept this. Describe the behavior 
and/or remarks by which she might demon- 
strate the following defenses against her 
conflict: Repression, Reaction Formation, 
Fantasy, Withdrawal, Displacement, Pro- 
jection, Intellectualization, Suppression, 


A maximum score of 10 points could be 
earned on each item. The tests were 
scored by three of the staff, each answer 
being discussed by the group before a 
score was assigned. 

The other quiz employed here presented 
fictitious case situations for the student to 


ANN FILINGER NEEL 


handle. It demanded that the S deal with 
a far more ambiguous area, where there are 
no set rules. The first question of this test 
was set up with special reference to the 
hypothesis regarding humanitarian phi- 
losophy. Much time had been spent during 
the class stressing the lack of success E 
treating and changing certain types o 
people and the consequent necessity i 
environmental manipulation and social 
support. 


1. You are in general practice in & Ber 
dium sized town. A 37-year-old man WD 
to you as a private patient. He is ma y 
and has three small children. He presen ago 
history of a minor back injury 12 years ain 
and has had vague complaints of back Pi 
since that time. He states that for this T 
son he is unable to work and has hes r 
employed most of the time for four SER ] 
Job history prior to the injury was ED 
odd jobs which produced irregular 2s ipe 
income. The family has been receiving Eoo 
eral assistance from the County UO was 
Department at those times the patien lie 
unemployed. The patient has now n , 
for permanent and total disability Pany 
ance on the basis that he has been to AS 
physicians and, he says, none have ao is 
him. Your complete medical examina ce tO 
negative. You estimate his intelligen evi- 
be slightly below normal. There is ted the 
dence of a psychotic process. He Sed n 
world was too much for him and he EDU 
plans or concern for the future. EN yout 
your diagnostic impression be? What sen E 
philosophy about the community's T 
Sibility for the man and his family? a 2k 

2. You are in general practice ana © 
year-old woman, married six months, your 
to you with the complaint of frigidity. asks 
physical examination is negative. ]d you 
for a good book on sex. How wou 
handle the situation and why? nd you 

3. You are in general practice s ec 
are treating a 55-year-old lady who Keep? : 
Successfully employed as a Lgs about 
Since a cerebral vascular acciden laininÉ 
Six months ago, she has been CON pe. 
of memory difficulty and inability ologic®! 
form her job as in the past. Psychol’ ro 
testing reveals moderate organic Q^ de” 
but sufficient ability to perform à satis” 
manding job. She is not responding 4 ex 
actorily under medical treatment 27^ ; 
presses concern about herself an sy 
In view of this, you feel there are P 


AUTHORITARIAN PERSONALITY AND LEARNING 


dde ues to the problem. How would you 
pese do: Situation and what would your 


"os en question, the staff formulated 
Ro w ich the students should cover and 
n discussed each answer and assigned a 
Score between 1 and 5. 
o (aum d of student reaction of course. 
tee 4 ^ final class period, the group was 
id a evaluate the class and discuss 
Vo E to it. This was routine pro- 
pem DE the students were well aware 
ioo pies opportunity was to be given. 
ccn he students were able to express 
We deren ee Obviously some of them 
es teat eir remarks for fear of getting 
Siria e, but the fact that graduation 
Bond y assured served to relieve them 
Ey fei egre. The discussion was recorded 
taking M who was apparently 
Fem we for use in planning future 
dos d . ose remarks were later coded 
he following criteria: 


T. 
mars Weture-Ambiguity continuum: re- 
through facon & desire for structure 
cism of pos better organization, criti- 
Scored EI ass for vagueness, etc. were 
and rema. SUR remarks were scored 0; 
“struct rks concerning a desire for flexible 
II, ferries program were scored —1. 
remarks ade toward Psychiatric Patients: 
rejection 2 ecting a dislike of patients, à 
{ents did patients, or a feeling that pa- 
Tal fenis like them were scored 1; neu- 
mind pation Were scored 0; and like or don't 
I lents scored —1. 


: Atti 
Marks g titude toward Psychiatry: re- 
Whip utr 


E eee unnecessary, etc. scored 
or Sh were remarks scored 0; and remarks 
Sr nee or Tonsychiatry, such as “want”? 

a medical practice” were scored 
win Ev: ; 
M ataluation of own ability to deal 
Be equate al: remarks of “weak” or "in- 
red 0; re Scored +1; neutral remarks 
às Marks of “adequate” scored —1. 
i 
0, Use 
ist e ada m remarks were recorded as 
Ret a t ues were added algebraically 
T otal score, 
FP. Thej 
Seg], "Vesti A 
ale Cores Few did not know the Ss’ 
he time of these ratings. 


197 


j No effort was made to assess intelligence 
since it was believed that students who had 
survived college and four years of medical 
school should have sufficient and fairly 
uniform ability to learn. ] 


RESULTS AND DISCUSSION 


Table 1 gives the chi square comparison 
of the F scale score and the score obtained 
on the test question regarding humani- 
tarian philosophy. If the relation predicted 
by Hypothesis 1 does exist, we would 
expect those persons above the median on 
authoritarianism to score below the median 
of the question. Such is the case. Thus it 
does appear that a person’s general feeling 
about people, his social philosophy, deter- 
mines or at least is related to whether or 
not the person can or does learn and/or 
use material relating to humanitarianism. 
(It may be, of course, that there is no 
difference in learning. The observed dif- 
ferences could be the results of variations 
sin performance due to variables other than 
learning. Just how one can bring out the 
existence of learning which did not affect 
behavior is, however, à question which 
haunts the operational minded investiga- 
tor.) 

The comparison of F scale and the total 
score on the ambiguous test is given in 
Table 2. Again, the prediction is borne out. 

The hypothesis holds that more factual 


TABLE 1 
Cut Square COMPARISON OF Score on F 


SCALE WITH SCORE ON Question RE- 
G HUMANITARIAN PHILOSOPHY 


GARDIN 
F Scale 
Median 71 
ime ae . 
fd. Above 
Mor | ae. j| TOE 
Humanitarian Ques- 
tion Md. or Below 5 9 14 
Above Md. 13 3 16 
Total 18 12 30 
xX = 6.45. 
p = .005 (one-tailed test). 


198 


TABLE 2 
Cut Square Comparison OF SCORE on F 
SCALE WITH SCORE ON 
“AMBIGUOUS” TEST 


F Scale 
Median 71 
ceo M. | Total 

Ambiguous Test Md. 

or Below 4 8 | 12 
Above Md. 14 4 | 18 

"Total 18 12 30 
2 = 0.08. 
P = .005 (one-tailed test), 
TABLE 3 


Cur Square COMPARISON or SCORE on THE 
F SCALE WITH Score on 
“FACTUAL” TEST 


Medan 
Cu ALI 
"Below | ANI | Tota 
Factual Test Md. or 
Above Ma, ele) 4 
"Total 13 10 238 


x! = .71 (corrected for continuity). 
p = 40. 


^ Seven Ss did not take the factual test. Of these 


seven, four were below the F scale median, three above 
it. 


material should not be subject to this 
differential effect. Table 3 gives the com- 
parison of F scale and “factual” test, Here 
the chi square is not significant, as the 
hypothesis would predict. 

The final prediction was that the more 
authoritarian person should be more un- 
comfortable in the class and dislike the 
subject matter. The student’s reaction to 
the course was assessed by coding his 
evaluative remarks regarding the class 
experience. The prediction would call for 
persons above the F scale median to be 
above the remark median. Table 4 reveals 
that while the trends are in the predicted 


ANN FILINGER NEEL 


direction, the results fail to reach an accept- 
able level of significance. This may have 
occurred because of a tendency for the 
students to soften their remarks or to i 
what was expected (this would be expecte 
of authoritarian persons). Eight students 
made no remarks, resulting in an extremely 
small N. Also, the prediction might be 
dulled by the fact that many evaluative 
comments were quite accurate and É€— 

Actually it was the students’ remar 
about the class which initiated the € 
Bitter complaint had been made by t 
Students to various staff members, a 
some had voiced extreme dislike of cours 
and subject matter. The greatest pe. 
unhappiness had been the seminar Or tho 
cussion-group nature of the class and D 
fact that nothing was ever “tied font 
for them. This was brought up on & "E. 
When case presentation had been i-r 
larly organized and definitive. This, ves 
many other efforts to meet student ie 
cism with lack of success, suggested Td 
Something must be interfering with i 
class other than the usual problems 5 
technique, subject matter, etc. In br: à 
these observations, and the sugge d 
trends in the data, it seems that the lage a 
esis deserves another, more sensitive 
before being discarded. 


TABLE 4 ae 
Cut Square Comparison SCORE 
F Scare wira REACTIONS 
TO CLASS 


F Scale 
Median 71 


Md. or | Above otal 


wW 
Belo Eger —— 
Reactions Md. or be- 14 
low 10 a 8 
Above Md, 3 
23° 
Total 13 5 


X! = 1.22 (corrected for continuity). of 
P = .15 (one-tailed test). z marks; 
* Eight students made no evaluation E Bey ur 
these eight, five were below the F seale ™ 
were above it. 


AUTHORITARIAN PERSONALITY AND LEARNING 


Returning to the general problem of the 
authoritarian personality on learning, one 
student’s remark regarding a case presen- 
tation illustrates the hypotheses of this 
study perhaps better than all the interpre- 
tive comments so far given. “I would like 
a classical example. The staff should make 
a diagnosis, If it turns out to be something 
else, turns out wrong, O.K., but it’s some- 
thing to remember. Then when I see a 
Patient in practice I can say ‘I saw one 
like you? Tt is so confusing now.” 


SUMMARY 


Tt was hypothesized that the more 
pathoritarian a person is, the more likely 
T would be to have difficulty (a) in learn- 
sd materia] which involved humanitarian 
ne a and the need for understanding 
du ©, (b) in mastering ambiguous mate- 
Gr T required him to think on his 
fent" ut (c) not in learning of factual sub- 
á BI EUM The authoritarian person 
i be more uncomfortable with, and 

ins likely to state dislike for ambiguous 
and fo. UM, and psychological subjects, 

./9* a democratic teaching atmosphere. 
TRI male seniors in a medical school 
giv E ^ required course in psychiatry were 
compare F Scale, the scores on this being 
Scores ted by means of chi square with 
ilo 9n a test question dealing with social 

Sophy regarding indigent persons, with 


199 


an examination dealing with handling case 
situations ( ambiguous" material), and a 
quiz over factual matter. The student's 
evaluative remarks regarding the class were 
used as an index of their discomfort and 
dislike of the class. The hypotheses regard- 
ing humanitarian ambiguous and factual 
material were validated. The prediction 
regarding dislihe and discomfort ap- 
proached significance, but because of the 
small N, and reality factors, it failed to be 
satisfactorily demonstrated. Further test- 
ing would seem to be warranted. It may be 
concluded that the more authoritarian a 
person is, the more difficulty he will have 
in learning psychological material. 

On the basis of the present findings, it 
would seem reasonable to anticipate that 
authoritarian persons would experience 
difficulty in mastering material with hu- 
manitarian content. One would predict 
that this same difficulty would appear in 
other areas of social science, especially 
where theoretical rather than factual 
materials are utilized. Research is now 
being conducted to test this prediction. 


REFERENCES 


Aporno, T. W., FRENKEL-BRUNSWICK, 
Erse, Levinson, D. J., & SANFORD, R. 
N. The authoritarian personality. New 
York: Harper, 1949. 


Received January 26, 1959. 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 5, 1959 


DIAGNOSTIC RATING OF 


TEACHER PERFORMANCE! 


DON J. COSGROVE 


Procter & Gamble Company, Cincinnati, Ohio 


The present article outlines a new 
method for evaluating the effectiveness of 
the teaching performance of secondary 
school and college instructors. The method 
is a modification of the forced-choice tech- 
nique, and the paper shows how it might 
be used to produce diagnostic profiles val- 
uable for alerting a teacher as to what his 
students consider to be the relatively 
strong and weak aspects of his perform- 
ance. 

Diagnostic application of rating prin- 
ciples was begun by grouping phrases de- 
scriptive of teacher performance into cate- 
gories according to similarity of behavioral 
reference—the grouping later being verified 
by statistical means. Use of these grouped 
phrases in a rating scale allows for totaling 
category scores which in turn may be pre- 
sented in “diagnostic” or “descriptive” 
profile form. The approach stems from 


original work of Wherry (1950) at Ohio 
State University. 


DEVELOPMENT OF THE RATING Form 


Initial Selection of Phrases 


The first step in the development of the 
teacher rating form was the selection of a 
pool of phrases which described specific 
elements of teacher performance. A list of 
900 descriptive phrases is given in the re- 
port of a study of the control of bias in 
rating carried out under the direction of 
Wherry (1950) at Ohio State under the 
sponsorship of the Personnel Research Sec- 
tion of The Adjutant General’s Office. One 
hundred ninety-six phrases were selected 
from this list because of their pertinence 


1 This article is taken from a doctoral 
dissertation completed at Ohio State Uni- 
versity. The writer acknowledges the valu- 
able assistance of Robert J. Wherry on this 
project and thanks him for it. 


to the general teaching methods used m 
the educational psychology program M 
which the present study took place. Four 
additional phrases were composed to bring 
the total to 200. All of the selected phrases 
were short descriptions of teacher behavior, 
and all were of a positive or favorable a 
ture. Those taken from the AGO es 
were originally gathered from paragrap » 
written by educational psychology |a d 
who were asked to write short essays "i 
scribing “good” teachers. Each of the a ^ 
phrases was written on a plain white 3x 
index card, one phrase to a card. s 
Five instructors in the educational P 
chology program, plus an additional gra aly 
ate student in psychology, independen 
sorted the phrases into five specified pes 
areas of general teacher performance. n 
subareas were preselected from 8 per 
group suggested in the AGO study us "m 
tioned earlier and bore the following tit 
sect 
A. Mastery and Organization of Subje? 
Matter m" 
B. Skill in the Control and Dise1p 
Students gtu- 
Reasonableness of Demands 2s Help 
dent Time and Effort in View ° 
a s Given 
C peint ne e e of Class 
Management Procedure — |, an 
Skill in Motivating, Inspiring 
Creating Confidence in Studen 


s most 


line of 


C. 


D € 


E. 


The 150 phrases on which there wa 
agreement by sorters were retainc®- 


The Descriptive Check List 


ne 
The 150 descriptive phrases We Peek 
presented in the form of a descripti 
list to 100 educational psychology stude 
at Ohio State University. The 9" pa 
were divided into three groups: 
member of the first group was aske 
to mind some previous instructor 
200 


RATING OF TEACHER PERFORMANCE 


he could give a particular rank between 1 
and 10 on an over-all scale of 1 (worst) to 
20 (best), among a typical group of 20 in- 
Structors. Students in the second group 
Were asked to choose instructors who could 
be ranked between 11 and 20 on the same 
scale, and members of the third group were 
asked to think of instructors to whom they 
Pw assign ranks between 9 and 16 on 
M qeu The grouping procedure was 
E wed | to increase the probability of a 
mal distribution of ranks being assigned 

to the 100 instructors. 
Ter stront was then asked to consider 
appli F phrases, one at a time, as they 
ana ^ to the instructor he had in mind 
dilin um 2 value to each phrase indi- 
lie = degree to which he thought the 
wd £s the teacher. A value of 5 
of 1 littl great, applicability, and a value 
and 4 n B applicability, while values 2, 3, 
the ue EM intermediate points along 
rim rs hus, if a student thought the 
ject,” J eie enthusiasm for sub- 
= A ied very well to the instructor, 
eh M space No. 5 for that phrase 
though, e answer sheet. If a student 
o the — phrase had little applicability 
si structor, he filled in space No. 1. 
hisheq Pa dante check list responses fur- 
utin the raw data required for com- 
8 scale values for the phrases and 


Selecting 
ing Send those to be used on the final rat- 


Py 
ference Indices 


ce ete index was established for 
“ability pe by computing the mean appli- 
Check lista = given the phrase on the 100 
of Willner, his index expressed the degree 
rase Fait of the students to apply a 
being © the behavior of the instructors 
Videq -oDSidered. Preference indices pro- 
ne first basis for assigning phrases 


Sets 
i 1n the final rating form. 
sep, « 

“mination Indices 


See, 

y ond r EI NE 

Vas ee the discrimination index, 
uted for each phrase by corre- 


201 


lating applicability values assigned the 
phrase and the ranks (from 1 to 20) given 
to the instructors. The discrimination index 
provided a measure of the degree to which 
the behavior described by a phrase was 
related to successful teaching. 


Factor Analysis of the Descriptive Phrases 


Since all the phrases included on the 
descriptive check list had previously been 
sorted into five categories or subareas of 
teacher activity, it was possible to add the 
applicability values given to all phrases 
within each category to obtain category 
scores. That is, each of the answer sheets 
produced a score for each subarea, giving 
a total of five scores. 

Beginning with the correlations between 
the scores for the five subareas and the 
correlation of each of the phrases with each 
of the subareas, a factor analysis of the 
items was achieved. The general procedure 
was a modification of a method for factor- 
ing large numbers of items reported by 
Wherry and Winer (1953). This method 
was used because it avoided the necessity 
of correlating each item with every other 
which would result in a matrix of unman- 
ageable size. 

Analysis of the items resulted in the ex- 
traction of a general factor of over-all 
teacher effectiveness, two subgenerals and 
four specific factors. The use to which 
these results were put will be shown in the 


next section. 


Final Rating Form 


All the information necessary for the 
construction of the final rating form was 
now available. The plan was to group 
phrases into 10 sets of four phrases each. 
Each phrase within a set was to have a 
significant loading on a different one of 
the four specific factors. The task of the 
rater was then to rank the phrases within 
each set as they described the teacher being 


considered. 
Grouping phrases 1m 
first in terms of prefere 


to sets was done 
nee indices. The 


202 


im was to have phrases in each set which 
24. would be equally willing to use 
in describing a teacher. That is, the four 
phrases should have similar preference in- 
p a also desirable for the phrases to 
have similar discrimination indices. This 
provided assurance that all phrases in a set 
were about equally related to general teach- 
ing effectiveness. It was hoped, too, that 
phrases within a set would, as nearly as 
possible, have equal loadings on the general 

factor of teaching effectiveness. Of course, 

each phrase had to have a significant load- 
ing on a different one of the four specific 
factors and negligible loadings on the 
others. Ten sets of phrases were thus 
chosen. This meant that, 40 of the pool of 

150 phrases were used, 

The titles given to the four Specific fac- 
tors of teacher performance as suggested 


by the phrases chosen to represent them 
were: 


1. Knowledge and O 
ject Matter 

2. Adequacy of Relations with Students 
in Class 


3. Adequacy of Plans and Procedures in 
Class 
4, 


Tganization of Sub- 


Enthusiasm in Working with Students 


The form is called “The Descriptive 
Ranking Form for Teachers” and is shown 
here in its entirety. A number follows each 
phrase to indicate the subarea to which 
the phrase belongs. These numbers do not 
appear on the form when it is used, 


This form consists of 10 se 
which are descriptive of instru 
ance. Each set is composed of 

Please consider the instructor you have 
for this course. In each set of phrases, rank 
the phrases from 1 to 4 as they apply to your 
instructor. Give a rank of 1 to the phrase 
which most applies, and 4 to the phrase 
which least applies, using Ranks 2 and 3 for 
the intermediate ranks. Every phrase must 
be ranked. There can be no equal ranks, 
et a 

es Always on time for class [3] 


ts of phrases 
ctor perform- 
four phrases, 


DON J. COSGROVE 


— Pleasant in class [2] f 

— Very sincere when talking with students 
[4] 

— Well-read [1] 

Set b F 

— Contagious enthusiasm for subject IE]. 

— Did not fill up time with trivial mate 
[3] 

— Gave everyone an equal chance p. "m 

— Made clear what was expected o 
dents [1] 

Set c 

— Classes always orderly [3] 

— Enjoyed teaching class [4] a gl 

— Friendliness did not seem force 

— Logical in thinking [1] 

Set d -" 

— Encouraged creativeness saute 

— Kept course material up to the minu 


m— 

— Never deliberately forced own decision 
on class [2] 

— Procedures well thought out [3] 

Set e bject [1] 

— Authority on own subjec 

— Friendly attitude toward students [4] 

— Marked tests very fairly [3] . way 2 

— Never criticized in a destructive 

Set f 

— Good sense of humor [4] 

— Spaced assignments evenly [3] öst 

— Students never afraid to ask qu 
in class [2] 

ü Well organized course [1] 

et " ep 

— Moseptoit students’ viewpoints with oP 
mind [2] by ow? 

— Increased students’ vocabulary 
excellent usage [1] 

— Students always knew what was 
up next day [3] 

— Students willingly worked for tes 

Seth ng [8] 

— Always knew what he was doe 

— Appreciated accomplishment f (21 

— Did not ridieule wrong answeriga (J 

— Well informed in all related fie 

Set i a (3) 

— Always had class material ready 4 

— Covered subject well [1] — . k out ap 

— Encouraged students to thin 
swers [4] 


ules and regulations fair [2] 
Set j 


— Always managed to get things 
time [3] 

— Course had continuity [1] 

— Made material significant [4] 

— Understood problems of studen 


i008 


coming 


cher ø 


done ” 


ts DJ 


RATING OF TEACHER PERFORMANCE 


TABLE 1 


203 


À Pnor 
ILE Rj 
ESULTING From Use or THE Descriptive RANKING FORM ror TEACHERS 


Subareas 


Scores 


Low 
10 15 20 25 30 s 


l. Knowled i 
ge and Organization of Subject Matter 
: Adequacy of Relations with Students in Class 
quacy of Plans and Procedures in Class 


4. Fnthüsizom : 
Enthusiasm in Working with Students 


T ed is scored by adding across sets 
Bubares S given to the phrases of each 
reversed rt Scoring, the rank values are 
phrase E eT ranked 1 is scored 4, a 
Por res ed 2 is scored 3, and so on. 
Bhores cdd purposes these subarea 
showin ay be put into profile form, clearly 
is: ta ig eg subarea ratings given 
able 1. er. Such a profile is shown in 
nen Subarea scores mean higher rank- 
Profile Uo ien of the subarea. On the 
quacy of oe 1, phrases of Area 3, Ade- 
Were tank lans and Procedures in Class, 
Enthusi ed highest and phrases of Area 4, 
Were asm in Working with Students, 
"decal lowest. 
is NS standings on the sub- 
any Fo ind d to one another and not 
i dica M standard. The profile does 
Other te: "d how a teacher compares with 
What it eae in general effectiveness. 
an Weak Ces show is the relatively strong 
that i areas of his own performance, so 
start p T ill know where he might best 
a 3 rci his teaching skill. 
Sort of x. was not validated against any 
the four dM eriteria of performance in 
Sver, it ia In regard to validity, how- 
hi un be pointed out that the 
Riley € individual phrases in the 
e» homi to teacher effectiveness. 
vig. Soneptapi i indices of the phrases 
e Qi at 39 js bigh, the median being .46, 
diserimin, m Qs at .53. As was stated, 
‘ation index indicated the de- 


are, 


is 


gree of relationship between the behavior 
described by a phrase and teacher effec- 
tiveness as seen by students. 

The Descriptive Ranking Form was used 
by students to rate their instructors in a 
basic educational psychology course at 
Ohio State University. Ratings took place 
in 12 separate class sections of the course, 
with 8 to 12 students in each section using 
the form. The forms completed in each 
section were randomly divided into two 
equal groups. By averaging individual area 
scores resulting from the forms used by 
each of these small student groups, two 
profiles resulted for each class section. 
Cattell’s (1949) shape correlation coefficient 
was used to express the similarity of the 
shape of the two profiles of each of the 12 
pairs. The extent of this similarity provided 
a rough indication of the amount of agree- 
ment which existed between two groups of 
students using the form to rate the same 
teacher. Pooling the 12 shape correlation 
coefficients thus computed resulted in a 
coefficient of .74. 

The advantage of the profile approach 
is that a teacher (and his supervisor) can 
consider his behavior in terms of really 
pertinent dimensions. The typical analysis 
of ratings involves & consideration of in- 
dividual item responses, and often results 
in a hodge-podge conception of the com- 
petence of a teacher. And the longer the 
rating form the more difficult it is to gen- 
eralize from it. The profile approach, on 
the other hand, presents à more organized 


204 


and useful analysis of the rating, giving 
the teacher a clearer idea of the relatively 
strong and weak areas of his performance. 
The forced-choice format also minimizes 
bias, which so often is evidenced in typical 
graphic rating schemes. 


SUMMARY 


A method has been presented for evalu- 
ating the relative effectiveness of a teach- 
er’s performance in four areas of activity. 
Phrases descriptive of teacher behavior 
were grouped in sets of four, each phrase 
representing a different one of the areas. 
In completing the form, students rank the 
phrases of each set as they apply to the 
instructor. The end result is a profile show- 


DON J. COSGROVE 


ing the relative standing of the instructor 
on the four areas of teacher activity. This 
evaluation system will be useful to the in- 
structor who wants to know where to begin 
work on improving his effectiveness. 


REFERENCES 

CaTTELL, R. B. rp and other coefficients of 
pattern similarity. Psychometrika, 1949, 
14, 279-298. r 

Wuerry, R. J. Control of bias in rating. 
(Sub-Project 2) Instructor Rating Seales- 
Washington, D. C.: Personnel Res. Sec- 
tion, AGO, U. S. Department of the 
Army, 1950. d 

Wuznny, R. J., & Winer, B. J. A metho 
for factoring large numbers of items- 
Psychometrika, 1953, 18, 161-179. 


Received January 26, 1959. 


JounNAL or Epu 
F ICATIONAL P: 
Vol. 50, No. 5, 1989 SYCHOLOGY 


LEARNING WITHOUT AWARENESS AND TRANSFER 
OF LEARNING SETS: 


JULIUS M. SASSENRATH 


University of California? 


qoum the many factors involved in a 
= plex learning process, educational and 
(wie psychologists have discussed 
a ey of learning sets or learning 
ou : earn (Buswell, 1956; Harlow, 1949). 
a o» the questions confronting the study 
Ate learning sets is whether or not 
Subject sels nre at times acquired without 
VAR S (S's) awareness of what he is 
arning. 
" nomen nof overlooking the importance 
Ket earner's drives in modulating the 
P. 4j) DNE of rewards, Thorndike (1935, 
ofS.R nypothesized that the consequences 
bis Pra, cu could improve perform- 
lenis pio S understanding what he was 
ki "cal ttempts to explore this hypothe- 
Without rise to the problem of learning 
oo. awareness (hereafter LWA). To- 
n rae experimenters (Es) operation- 
ane E: awareness as a correct verbaliza- 

Pom of what he is learning (Adams, 
in nes man, 1955). Any reliable inerease 
sidered Lo prior to awareness is con- 

ihm e LWA. Since S's performance 

ase as a function of partially cor- 


tect 
Ver] x B x 
balizations for his responses, some 


Thi : 
dissents Article is adapted from a Ph.D. 
ets em Te entitled: Transfer of learning 
leggi s Oying “learning without aware- 
sity o ocedures, submitted to the Univer- 
pola like worn Berkeley. The author 
E: Carter 9 thank Guy T. Buswell, Harold 

Olmos im Rheem F. Jarrett, Jack A. 
Restiong a Bert Y. Kersh for helpful sug- 
ent criticisms. Some of the material 
Rohe ee dd in a paper at a joint meeting 

Searg pues and California Educational 
in; "th, ica ee in San Francisco, 

E the aut - The data were reanalyzed dur- 
In. "ice ithor’s tenure as a Public Health 
Stitute search Fellow of the National 

* Noy, 9f Mental Health. 

at Indiana University. 


a 
of P" 


studies of LWA have been questioned by 
Adams (1957). 

Learning set has been defined as a learn- 
ing how to learn a general type of problem 
(Harlow, 1949). Harlow, in his work with 
monkeys and preschool children, not only 
found evidence for a learning set on one 
type of problem but, also, transfer of 
learning set to a series of reversal problems. 
These data prompted the suggestion that 
“the learning set delivers the animal from 
Thorndikian bondage.” Indeed, studies on 
LWA, which employed a systematic prin- 
ciple for administering reinforcement in a 
Thorndikian situation (Hirsch, 1957; Irwin, 
Kaufman, Prior & Weaver, 1934; Philbrick 
& Postman, 1955; Postman & Jarrett, 
1952; Thorndike & Rock, 1934), also have 
found that adult human Ss could be de- 
livered from “Thorndikian bondage” by 
developing a learning set to infer a prin- 
ciple. However, no study has investigated 
the question of transfer of learning sets 
employing systematic Thorndikian proce- 
dures which would permit the assessment 
of LWA in learning sets. 

The purpose of the experiment to be re- 
ported here is (a) to investigate the effects 
of two procedures for developing a learning 
set to infer a principle in training and the 
subsequent effect of the training on the 
development of a learning set to infer a 
reversal principle during the transfer 
period; (b) to compare the LWA process 
and final criterion performance of the two 
training conditions; (c) to determine 
whether or not the two experimental groups 
show less LWA than the two control 
groups during the transfer period; and (d) 
to indicate whether or not LWA can be 
attributed solely to S's partially correct 


verbalizations for his performance. 


205 


206 


METHOD 


Task. Each S was presented with a series 

of individual stimulus words from 2 to 10 
letters in length, and was required to re- 
spond to each word with a number from 1 
through 9. The E administered continuous 
reinforcement. For the two experimental 
groups in the training period, S's response 
was called Right if the number was equal 
to the number of letters in the stimulus 
word minus one, e.g., well — 3, general — 
6, etc. Any other response was called 
Wrong. Thus the principle upon which 
reinforcement was administered, and which 
the Ss could learn, was based upon a direct 
relationship between the length of the 
stimulus word and the correct response, 
i.e., as words become longer the numbers 
become larger. 

In the transfer situation an entirely dif- 
ferent series of stimulus words was used. 
However, the stimulus class remained 
words from 2 to 10 letters in length and 
the response class numbers from 1 through 
9. An S's response was now called Right 
if the number was equal to 11 minus the 
number of letters in the stimulus Word, e.g., 
well — 7, general — 4, etc. Any other re- 
Sponse was called Wrong. Thus the principle 
upon which reinforcement was now admin- 
istered, and which S could learn, was the 
"reverse" of the training principle, i.e., as 
words become longer the number becomes 
smaller. This will be called the reversal 
principle. 

Materials and procedure. The stimulus 
material consisted of 360 common English 
words, each typed in capital letters on a 4 
by 6 white index card. Stimulus cards were 
presented at intervals of approximately 5 
sec. After every block of nine trials, Ss in 
the two training groups and the four trans- 
fer groups were asked, “Upon what basis 
were you responding?” Each S's response 
to this question was recorded by E along 
with S's number response. Tt was intended 
that this question serve two purposes: (a) 
to elicit a different class of responses (from 


JULIUS M. SASSENRATH 


the number responses) to aid in an PE 
pretation of the learning process and ( ) 
to direct attention to a basis or principle 
for responding. When Ss in the training 
and transfer periods reached a criterion 
where they could correctly verbalize the 
principle followed by errorless aep. 
for one block of trials, the training 8n 

transfer periods were terminated. 3 

In the training period, for the two experi: 
mental groups, there was a ramet e 
162 stimulus words arranged in 18 bloc 1 
of 9 words each. One of the experiments” 
groups was presented with a homogena 
sequence in which each block of trials co ? 
tained different words with the same x 
ber of letters (hereafter, Exp. Hom. di 
this group, the order in which blocks as 
words of different lengths appeared TA 
determined randomly. A second UM 
tal group was presented with a heteroB d 
neous sequence in which each block Ec 
trials contained different, words with E 
word of each length ws o within e^ 
block (hereafter, Exp. Het.). 

The two control groups had only s 
“warm-up” during the "training" per! in 
Which consisted of guessing nume. 
response to each of 45 stimulus words, ong 
no acknowledgement of Right and los 
by E. This procedure was not only ects 
ployed as a control for warm-up € nce 
but, also, to establish an empirical E 
level of performance. One group ua 
a homogeneous sequence (hereafter, 3 
Hom.) and the other a heterogeneo" ulus 
quence (hereafter, Cont. Het.) of stim -up 
words. During neither of the two a at 
procedures were Ss asked, “Upon 
basis were you responding?" Jlowed 

The transfer period immediately f0 Ther? 
the training and warm-up periods. wo dé 
was a maximum of 198 stimulus each 
arranged in 22 blocks of nine WOTCS “wo 
All four groups (two experimenta ? wer? 
control) during the transfer perio ence 
presented with a heterogeneous 800^. ig 

Instructions. Before beginning th? ent! 
ing procedure, Ss in the two expe? 


——— D[nw— 


TRANSFER OF LEARNING SETS 


fe two control groups were read the fol- 
owing instructions: 


P prepared a list of words and have 
pee a number with each word. I have 
be i from 1 through 9. I have the 
m n written on a paper on which I 
Ek. enora your responses. It is a long list 
EA shall go through it only once. Try to 
5 your responses as quickly as you can. 
Ie there any questions. 


In addition o - 
the two experimental grow 
Siete wo experimental groups 


pt interested in the number-response 
Era bm to give to each word, as a 
each seid saying Right or Wrong after 


ies intended that this latter instruc- 
Ps. ne to S an “intentional set" to 
dr ADM studies using similar proce- 
D e m employed an “incidental set” 
far ee 1957; Philbrick & Post- 
oa ). Before the training began, any 
Pim S pertaining to instructional proce- 
Th ere answered. 
e following instructions were read to 


each h ate 
lure 3 before beginning the transfer proce- 


N , 
Or n B Show you a different set of cards 
shall pr ent set of words. Again, however, 
Which Present you with a series of cards on 
each E will see one word. Respond to 
am int with a number from 1 through 9. 
whi SS erested in the number-response 
function u jim to give to each word, as a 
After SE renee Right and Wrong 


Desi, 
ach; a 
Stoups 


A There were four groups of 20 Ss 
Th experimental and two control 
M5 B bird situation consisted of 
e i orial design in which the fac- 
eneous presentation treatments (ho- 
R ing-ts Or heterogeneous) and the 
«D wig nis (experimental or con- 
he trainin, the four groups received during 

Subjects g and warm-up periods. 
v e Ss were 80 students from 
we Versity ate classes in education at the 
Ste assi of California, Berkeley, who 
Éned by means of a table of ran- 


mo 
inr 


207 


dom numbers to one of the four groups. 
with a final correction for equal Ns. Each 
S volunteered to participate in the experi- 
ment: by such a selection, the sample may 
have been restricted with regard to moti- 
vation and ultimate performance. 


RESULTS 


The data for the training and transfer 
periods have been analyzed with reference 
to: (a) the over-all performance in terms 
of the number of blocks of trials to criterion 
and (b) an analysis of the learning process 
as reflected in learning curves. 

Trials to criterion. All of the Ss in the 
two experimental groups during the train- 
ing period were able to correctly verbalize 
the principle. However, the rapidity with 
which Ss correctly verbalized and reached 
the learning criterion differed for the two 
groups. The mean number of blocks of 
trials to criterion in training were 6.20 and 
9.10 for the Exp. Hom. and Exp. Het. 
groups, respectively. The mean differences 
were significant beyond the .02 level (t = 
2.62, df = 38). Thus, as was expected, the 
th a series of different 
words of the same length in each block 
(Exp. Hom.) learned how to learn the 
principle more rapidly than the group pre- 
sented with a series of different words of 
varying length in each block (Exp. Het). 

As was found during the training period, 
all Ss in the four groups during the transfer 
ere able to correctly verbalize the 
mber of blocks of 
riod 


group presented wi 


period wi 
principle. The mean nu 
trials to criterion in the transfer pe 
were 10.95, 10.90, 17.00, and 17.20 for the 
Exp. Hom., Exp. Het., Cont. Hom., and 
Cont. Het. groups, respectively. An analy- 
sis of variance was conducted to test the 
statistical significance among the four 
groups. The main effect of the training 
(versus control) treatments during training 
on the transfer situation was highly signifi- 


cant (F = 5437, df=1 and 76, p < .01). 


Therefore, learning to learn a principle in 
training facilitated learning to learn the 
reversal principle during the transfer pe- 


208 


riod. Thus Harlow's (1949) findings with 
nonverbal primates has been somewhat 
verified with highly verbal primates, 
namely, university students. The presen- 
tation treatments in training, however, did 
not contribute a source of variance on the 
number of blocks of trials to criterion in 
transfer (F = .00). Furthermore, the inter- 
action (training X presentation) can be 
accounted for in terms of random sam- 
pling (F = .02). " 
Analysis of learning curves. With nine 
stimulus words per block and nine possible 
numbers to choose from, the number of 
correct responses expected by chance is 1.0. 
As Philbrick and Postman (1955) have re- 
ported, the probability is less than -01 that 
S could obtain four correct responses per 
block of nine trials by chance alone, Inan 
effort to establish an empirical chance level, 
Ss in each of the two control groups were 
given five blocks of nonreinforced trials, 
The mean performance curves for the two 
control groups fell at the chance level of 
expectation, 
Figure 1 presents the mean number of 
correct responses prior to correct verbaliza- 


tion of the principle for the two experimen- 


TO 


60 


°---0 EXP. HOM 
?--* EXP HET 


50 


40 


3.0 


20 


MEAN NUMBER CORRECT RESPONSES 


BLOCKS OF TRIAL IN VINCEI 


Fra. 1. PERFORMANCE Prior TO 
THE TRAINING PERIOD. 


JULIUS M. SASSENRATH 


tal groups during the training period. m 
both groups had errorless performance A 
lowing correct verbalization of the pes 
ciple, the two blocks of trials on whic 7 
correctly verbalized the principle and t 1 
reached criterion are omitted in Figure " 
Inasmuch as Ss verbalized the principle 


after different numbers of blocks of trials, ` 


the data were made comparable by the pei 
of Vincent eurves (Hilgard, 1938). ^ a 
transformation of the data was gu 
in order to reduce heterogeneity of var 
ance. - 
Grant’s (1956) procedure for the € 
tical analysis and comparison of curves ib- 
applied to the scores comprising the cde 
Seen in Fig. 1. This statistical "d Tia 
indicates the over-all trend or slope © ro 
curves is significantly different wi 
(P = 2836, df = 3 and 114, p < 00D. 
Thus, there is substantial learning pt "d 
correct verbalization of the principle. 2 
ever, when analyzing for the type of 8 e; 
of the over-all trend, only its linear of 1 
ponent was significant (F = 36.41, d od 
and 38, p < -001). Therefore, the nm 
learning to learn over successively d vd 
ent S-R associations appears to shov 


29 
ae 
wert 
ae 
E 
- 
QT 
e^ 
3 4 


NT QUARTERS 


G 
IN 
DUE 
Corrrcr VERBALIZATION OF THE PRINCIPLE 


| 


TRANSFER OF LEARNING SETS 209 


constant increase prior to correct verbaliza- 
tion of the principle. The between-group 
means is not significant (F = .05), and 
indicates that the average number of cor- 
rect responses over the Vincent quarters is 
very similar for the two groups. However, 
Since the statistical analysis indicates that 
the between-group trends is significant 
(F = 4.02, df = 3 and 114, p < .01), it is 
apparent that the slopes for the two learn- 
Mg curves are different with group Exp. 
Hom, having the steeper slope. When ana- 
lyzing for the type of slope of the between- 
group trends, only its linear component is 
Significant (P = 5.87, df = 1 and 38, p < 
08). Thus, although the two groups learned 
at different rates, the rate of improvement 
La | Constant within each group. Finally, a 
ot _ Significant difference in the be- 
Ween-individual means (F = 9.76, df = 
38 and 114, p < .001) indicates that reliable 
nes of individual differences were ob- 

ained, 
E rs critical analysis of the perform- 
ranted, ,Broup Exp. Hom. appears war- 
ed since neither Philbriek and Postman 
lox js Hirsch (1957) found such high 
balizati, performance prior to correct ver- 
he pre It will now be shown that the 
ient or erformance prior to correct state- 
Upon the principle is not based solely 
be E eria ai correct verbalizations of 
analya le. In order to facilitate this 
mplosca Hirsch’s (1957) procedure was 
alization id which a partially correct ver- 
ii id Was defined as a statement by 
hy, ing that the magnitude of his 
ber op | eSPonses were related to the num- 
etters in the stimulus words. Any 
be n Se m Rip responding was judged to 
it impr Correct statement of the principle. 
tion o ee prior to correct verbaliza- 
: iati buted to partially correct 
Ch ver i S, then the largest number of 
: block, ations should be offered on 
ba ps Immediately prior to correct 
ym. i ud Of the 20 Ss in group Exp. 
“lization “ning, 15 offered incorrect ver- 
Sand 5 offered partially correct 


verbalizations on the block of trials imme- 
diately prior to correct verbalization of 
the principle. Yet, the mean difference be- 
tween the two subgroups in the number of 
correct responses on the block of trials 
immediately prior to correct verbalization 
is not significant (¢ = -16). Thus, the high 
level of performance by Ss offering incor- 
rect verbalizations of the principle in 
Group Exp. Hom. tends to invalidate the 
argument that partially correct verbaliza- 
tions can be invoked a priori to define away 
the phenomenon of LWA. 

Figure 2 presents the mean number of 
correct responses prior to correct verbaliza- 
tion of the reversal principle for the two 
experimental and two control groups during 
the transfer period. Again, since all four 
groups had errorless performance following 
correct verbalization of the reversal prin- 
ciple, the two blocks of trials on which S. 
correctly verbalized the principle and then 
reached criterion are omitted in Fig. 2. 
Following the construction of Vincent 
curves (Hilgard, 1938), a log transforma- 
tion of the data was performed in order to 
reduce heterogeneity of variance. 

Grant’s (1956) procedure for the statis- 
tical analysis and comparison of curves was 
applied to the scores comprising the curves 
seen in Fig. 2. The over-all trend or slope 
of the curves is significantly different from 
zero (F = 114.16, df = 3 and 228, p < 
.001). Thus, there is substantial learning 
prior to correct verbalization of the reversal 
principle. Furthermore, when analyzing 
the over-all trend for the type of slopes 
represented in these curves, both the linear 
component (/ = 371.69, df = 1 and 76, 
p < .001) and the quadratic component 
(F = 4.85, df = 1 and 76, p < .05) are 
significant. These latter results indicate 
that the rate of learning is largely constant 
with a small positive acceleration which is 
probably not due to chance alone. Thus, 
the process of learning the reversal pon 
ciple under the conditions of this study is 


‘only slightly curvilinear. The between- 


group means appears to be highly signifi- 


210 


SOT o-o EXP HOM. 


e---e EXP. HET. 
o— CONT. HOM. 
e— CONT. HET. 


13.0 


S. 
MEAN NUMBER CORRECT RESPONSE 


Fra. 2. Perrorman 


JULIUS M2 SASSENRATH 


3 4 
BLOCKS OF TRIALS IN VINCENT QUARTERS 


URING 
CE PRIOR ro Correcr VERBALIZATION OF THE PrinciPLE D 
THE TRANSFER PERIOD. 


cant (P = 12.19, df = 3 and 76, p < .001), 
indicating that the average number of 
correct responses over the Vincent quarters 
are different among the four groups. When 
the source of variation of the between- 
group means is partitioned into the two 
main and interaction effects, only the main 
effect of training (versus control) is signifi- 
cant (F = 34.73, df = 1 and 76, p < .001). 

This finding indicates, as may be seen in 
Fig. 2, that the control groups gave a larger 
number of correct responses than the ex- 
perimental groups. However, as pointed 
out earlier, the two control groups required 
significantly more blocks of trials to crite- 
rion than did the experimental groups. 
Hence the training treatments given the 
experimental groups, as compared with the 
control groups, resulted in dual effects 
during transfer: (a) fewer correct responses 
prior to awareness, and, at the same time, 
(b) fewer trials to reach the awareness 
criterion. The between-group trends is not 
significant (F = .46), indicating that the 
rates of learning among the groups are very 
similar. Partitioning of the source of varja- 
tion of the between-group trends into 


linear and quadratic components wa 
respective main and interaction € ca 
showed no significant F values. d 
a highly significant difference between n 
dividual means (F = 4.01, df = 76 et 
228, p < .001) indicates that reliable ES i 
ures of individual differences were uw 
Can the large increase in peg 
during the transfer period for the ai 
perimental and two control groups prio 
verbalized awareness be accounted i 
partially correct verbalizations of the v 
ciple? Of the 80 Ss, 69 gave partially co^ 
and 11 gave incorrect verbalizations 
that block of nine trials when they pa- 
reached four correct responses. The ance 
bility of obtaining 44 correct by ° 1955): 
alone is .01 (Philbrick & Postman, | m 
Furthermore, on that block of trials = o 
diately preceding correct verbalizatio ot 
the principle, 73 Ss gave partially d of 
and seven gave incorrect verbalizatio that 
the principle. Therefore, it appears "ig 
the high level of performance bur 
conditions of the transfer period can paliza 
be attributed to partially correct ve" 
tions of the reversal principle. 


TRANSFER OF LEARNING SETS 211 


Discusston 


Tt is, of course, apparent that an inter- 
Pretation of the results is dependent upon 
one a) definition of awareness. A customary 
iofinition has been a correct verbalization 
z S about the reasons governing his re- 
a (Adams, 1957; Postman, 1955). 
E. a definition does not preclude im- 
Yee resulting from partially correct 
E If one is still content to 
x e customary definition, then group 

n om. during training showed some— 
wile. nglite amounts very little—LWA, 
p uring the transfer period the two 

xi a ee groups—and particularly the 
Ota groups—showed strong evi- 

or LWA, 
Bi, S ines not agree with the usual defi- 
fines Awareness, but more stringently 
allan awareness as partially correct ver- 
group E! by S for his behavior, then 
four " Xp. Het. during training and the 
little eon during transfer showed very 
in 422 any, LWA. However, if awareness, 
ect a study > is redefined as partially cor- 
«" Verbalizations of the principle and 
d Correct" is defined as number 
Number. ia S which are related to the 
) tirent letters in stimulus words, this 
ence Gia provides substantial evi- 
cating th WA. Witness the results indi- 
demonstra most Ss in group Exp. Hom. 
and yet A a high level of performance 

? prin ty ered incorrect verbalization of 
that Wade le. Hirsch (1957) has also found 
No ai lally correct verbalizations could 
Wa, rine explain his evidence for 
be Some E ^ is the case, then there must 
40) "hen idity in Thorndike's (1935, p. 
differe hesis that the consequences of 

Crease R associations can operate to 
anding performance without S under- 
M Bon a he is learning. Yet, depend- 
n Pears ed experimental conditions, it 
tat eed earning to learn a principle 
v. Wed > in part, without awareness, 
thal, "y Partial awareness and then 

awareness, 


Finally, for educational theory and prac- 
tice two points could be mentioned. First, 
Estes (1956) has pointed out that research 
on LWA does not neglect the importance 
of those variables which “lead the S to 
verbalize relationships.” However, it should 
be emphasized that for education it is par- 
ticularly important to identify those vari- 
ables or conditions which foster LWA. As 
a result, educators could possibly better 
define the conditions which maximize 
awareness (verbalization) or understanding 
of relationships. The question then arises 
whether or not the problem of LWA versus 
learning with awareness in psychology is 
similar to the old problem of mechanical 
versus meaningful learning in education. 

Second, research on learning sets or 
learning how to learn, employing preschool 
children, may indicate whether or not one’s 
facility to learn could be greatly increased 
by early and extended learning experiences. 
If confirmed, then such learning experience 
would be contrary to the theory of “wait- 
ing for nature to produce maturation before 
introducing topics” (Cronbach, 1950, p. 
237). Thus educational theory, practice, 
and research might re-emphasize learning 
how to learn rather than just learning 
when to learn. 


SUMMARY 


This investigation is concerned with the 
influence of two procedures for presenting 
stimulus material on learning a principle in 
training and the subsequent effect of this 
training on learning a reversal principle 
during the transfer period. The group in 
training which received a heterogeneous 
stimulus-word presentation showed only 
little evidence for LWA, while the group 
which received a homogeneous stimulus- 
word presentation gave substantial evi- 
dence for LWA. For the latter group, 
partially correct verbalizations for the 
principle could not account for this LWA. 

The group which received a homogeneous 
presentation inferred the training principle 
more readily than the group which received 


212 


a heterogeneous presentation. Yet, the two 
procedures for presenting the stimulus 
material in training had no differential 
effect on learning the reversal principle 
during the transfer period. However, learn- 
ing to learn the training principle under 
either of the two presentation treatments 
did facilitate learning to learn the reversal 
principle. During the transfer period, the 
two experimental groups—and particularly 
the two control groups—evidenced a higher 
level of performance prior to awareness of 
the reversal principle. This high level of 
performance could be attributed largely to 
partially correct verbalizations of the 
principle. 


REFERENCES 


Avams, J. K. Laboratory studies of behav- 
ior without awareness. Psychol. Bull., 
1957, 54, 383-405. 

BuswELL, C. T. Educational theory and the 
psychology of learning. J. educ. Psy- 
chol., 1956, 47, 175-184. 

Cronsacu, L. J. Educational psychology. 
Annu. Rev. Psychol, 1950, 1, 235-254. 
Estes, W. K. Learning. Annu. Rev. Psychol., 

1956, 7, 1-38. 

Grant, D. A. Analysis-of-variance tests in 
the analysis and comparison of curves. 
Psychol. Bull., 1956, 53, 141-154. 


JULIUS M. SASSENRATH 


Harrow, H. F. The formation of learning 
sets. Psychol. Rev., 1949, 56, 51-65. | 

Hircanp, E. R. A summary and evaluation 
of alternative procedures for the con- 
struction of Vincent curves. Psychol. 
Bull., 1938, 35, 282-207. 

Hinscu, J. Learning without awareness and 
extinction following awareness 83 a 
function of reinforcement. J. ezp. Psy- 
chol., 1957, 54, 218-224. & 

Irwin, F. W., KAUFMAN, K., PRIOR, G^ t 
Weaver, H. B. On “learning without 
awareness of what is being learnec. 
J. ezp. Psychol., 1934, 17, 823-827. X 

Purnsnrick, Ewrny B., & Postman, D ub 
further analysis of learning WIS 
awareness. Amer. J. Psychol., 1955, 09» 
417-424. "m 

Postman, L., & Jarrerr, R. F. An ka i 
mental analysis of learning ur p 
awareness. Amer. J. Psychol., 1952, 99 
244-955. aro witli? 

Postman, L. The analysis of learning bea 
out awareness. Paper read at Am GEO 
Psychological Association, San 
cisco, September 1955. s 

THORNDIKE, E. L. The psychology Ae: 
interests and attitudes. New YOU 
pleton Century, 1935. T, JR 

THORNDIKE, E. L., & Rock, R. spat js 
Learning without awareness of M: ade 
being learned or intent to learn 

exp. Psychol., 1934, 17, 1-19. 


Received January 81, 1959. 


Cu————quu——— Á!"——————————————«—— 
———————— HÓáUOm— ssá— 
o H———M * 


JOURNAL or EDUCATIONAL PsYcHoLoGY 
Vol. 50, No. 5, 1959 


VALUE DIFFERENCES BETWEEN PUBLIC AND PRIVATE SCHOOL 
GRADUATES 


W. CODY WILSON 


Harvard University 


McArthur (1955) raised the question of 
personality differences between upper- and 
middle-class adolescents and presented 
data in Support of a hypothesis concerning 
Such differences. The data presented were 
differences between publie and private 
Secondary school graduates, who were fresh- 
men at Harvard College, on responses to 

hematic Apperception Test pictures. 
puey differences were predicted on the 
Dd of general knowledge of social classes 
mos Kluckhohn's (1953) ideas of domi- 

and variant value orientations. 
in Arthur's findings stimulate several 
“Siege questions. Are these same dif- 
Wie T manifested on more direct meas- 
bii 9! values? Do the two groups of stu- 

5. ii on other dimensions of values? 

5 *se differences in values due to the 
erent experiences of the two groups in 
» eog ondary schools, or are they mere 
fon lons of differences between the two 
Scions in such background variables as 
ion eo status and religious orienta- 
Persist pur families? Do these differences 
libera Arough four years’ experience in a 
Deared arts college, or have they disap- 

by the end of the senior year? 
tine Part of a larger study, further data 
obe nt to the question of differences 
ates > Public and private school gradu- 

Mer a collected. These data do not 
Arthurs a the questions raised by Mc- 
vide d ndings; they do, however, pro- 
qu stione o ive answers to some of the 
Some and help to rephrase and sharpen 

of the others. 


he 


Der 


PROCEDURE 


e 
tensi Aim Teported here are from an ex- 
Seniors questionnaire answered by 165 
at Harvard College during the 


213 


middle of the academic year 1956-57. The 
questionnaire included a number of items 
inquiring into antecedent background char- 
acteristics of the students and four sets of 
items reflecting various value dimensions. 
These latter sets of items were: (a) 16 
items on occupational and work values 
which were an adaptation and extension of 
items developed by Centers (1949); (b) 20 
items measuring the four dimensions ex- 
tracted by Bales and Couch (1956) in a 
factor analysis of a large domain of values 
covered by the Value Profile Test; (c) 12 
items reflecting the different modes of 
answering value orientation questions posed 
by Kluckhohn (1953); and (d) an item re- 
flecting academic achievement. 

The sample was composed of 88 public 
school graduates and 77 private school 
graduates who were seniors at Harvard 
College. There were no statistically signifi- 
cant differences between the two groups in 
terms of patterns of geographical origin, 
family religious orientation, college resi- 
dence, extracurricular activities, scholastic 
aptitude, or academic area of concentra- 
tion. The two groups did differ in terms of 
fathers’ occupations and family incomes, 
with the private school group containing 
more students whose fathers were in higher 
status occupations and whose family in- 
come was more than $7000 a year (p less 
than .001 in both cases). 


RESULTS 


The two groups differed, at the .05 level 
of confidence, on 6 of 16 items concerned 
with occupational values (see Table 1). 
More public school graduates valued the 
opportunities to exercise a particular com- 
petence, to make a contribution to society 
at large, and to add to the accumulating 


214 


W. CODY WILSON 


TABLE 1 
PROPORTION or PUBLIC AND PRIVATE SCHOOL GRADUATES REPORTING CERTAIN 
OCCUPATIONAL VALUES TO BE IMPORTANT IN THEIR CHOICE OF A CAREER 


Proportion Reporting 
Value to be Important Probability 
Value of Difference 
Public Private 
(N = 88) (N = 77) 
To make decisions and give orders 23 37 05 
To assume responsibility 55 70 05 
To be in close contact with people 50 66 05 
To exercise a particular competence 50 29 OL 
To make a contribution to the society at large 75 59 05 
To add to the accumulating body of knowledge 58 42 05 
and culture 
To exercise intelligence in the solving of prob- 72 60 -10 
lems 
To express your own personality 60 48 12 
To work as an individual 10 19 13 
To be a leader 40 48 25 
To have varied and interesting experiences 71 78 28 
To be looked up to by others 30 37 -25 
To be assured of a steady income and perma- 37 29 25 
nent position 
To make a good deal of money 20 25 25 
To be of personal service to others 54 60 25 
To be independent and your own boss 36 30 25 


body of knowledge and culture; and more 
private school graduates valued the oppor- 
tunity to make decisions and give orders, 
to assume responsibility, and to be in close 
contact with others. 

In terms of Kluckhohn’s (1953) value 
orientations, the two groups differed, at 
the .05 level of confidence, in three of the 
four areas (see Table 2). The public school 
graduates were more Doing, Individualis- 
tically, and Man-over-nature oriented; and 
there were no statistically significant dif- 
ferences in Time orientation. 

Public and private school boys differed, 
at the .05 level of confidence, on two of 
Bales and Couch’s (1956) four factors (see 
Table 3). The public school graduates were 
higher on Equalitarian Ideology and In- 
dividual Orientation; there were no differ- 
ences between the two groups in terms of 
Acceptance of Authority and Need-deter- 

ined Assertiveness. 
üt school graduates had higher aver- 


age grades in college (see Table 4- 
public school graduates were heavily 9 
represented in the high honors 
gory, and the private school graduates d 
heavily over represented in the satis 
tory (C) category. 


Discussion 


the 
It is now possible to answer some of 


o 
questions raised in the earlier discussio” e 
McArthur’s (1955) findings. First, 9° oy 
ferences between public and private Salue 
boys in terms of Kluckhohn's (1953) Me 
Orientations (which were inferred test) 
Arthur from responses to a projective t 
manifested in a more direct measur os. [Ë 
situation? The answer is a qualifie of the 
is not possible, because of the natur? mu 
two sets of data, to make a point of MO 
comparison between the finding? gen 
Arthur and those reported here- e 
however, the two studies show " 
pattern of differences; the PU 


VALUES OF PUBLIC AND PRIVATE SCHOOL STUDENTS 


TABLE 2 


215 


Responses or PRIVATE AND PUBLIC SCHOOL GRADUATES TO Irems REFLECTING 
KrucknHonw!s (1953) VALUE ORIENTATIONS 


Number Agreeing 
With Statement Chi 
Value Orientation = ARE P 
Public Private 
(N = 88) | (N = 77) 
Man-Nature Relationships 
an-over-nature: The forces of nature must be 
Overcome and put to the use of human beings; 
it is a part of man’s duty to overcome nature’s 
Obstacles 68 48 4.2 .05 
àn-in-harmony-with-nature: There is no real 
Separation between man and nature; the two 
are essentially in harmony 39 41 1.5 25 
“n-subject-to-nature: Man’s destiny is subject 
to the whim of nature 27 20 0.5 | .25 
Relational 
S vidual; Individual goals should always per 
recedence over the goals of the family an 
Other groups E 35 19 40 |.05 
olateral: A man should belong to some group 
Whose roles and goals are more important than 
itl other goals for him 28 26 0.1 | .25 
neal: The goals of one’s family—and especially 
=< Continuance through generations—are the 2» 10 
Activity. Important goals a person can have 1 1 5 t 
RES A man's life becomes meaningful in terms 
9f his Accomplishments; that is, he should be fF 
potted by what he does 68 48 4.2 0 
Velin-becoming: The goal of life is the de- 
9pment, of all f the self as an inte- 
pitted whole "Penis ° 73 588 | z4 a 
we For a mature personality the preferred 
oe of behavior is the spontaneous expression T. 25 
Time °2¢’s innate self 46 38 ; 3 
Futu 
Te: A person should face toward the future— 
Pact there lies the fulfillment of life 59 58 14 | .25 
Upon Ae Primary emphasis of life should be 
tr. n the restoration and maintenance of the 2D 
Proseditions of the past 12 17 2.0. ju 
E One should live primarily in the present Di0 25 
Snoring traditions and the future 17 13 . . 


Wu n H B 
“an s ds exemplifying traditional Ameri- values? Again the answer is a qualified yes. 


Schoo} "^. orientations and the private The data reported here covered three pr 

to a Braduates emphasizing these values tional value areas, and the public an Pa 

Playing St extent and consequently dis- vate school graduates differed in eac E 
A se, Somewhat variant orientation. these areas. The qualification is concern! 


Beco 
of Stud 


ents differ on other dimensions of 


nd question ith the question of independence of the 
was, do the two groups a t aut fiis quie a fe 


216 


W. CODY WILSON 


TABLE 3 


MEDIAN Scores or PUBLIC AND PRIVATE SCHOOL 
GRADUATES ON Four VALUE ScALES 


Median Scores 


Chi Square P 
= Public | Private | Median Test 
(N = 88) | (N = 77) 

Acceptance of Authority 11.0 13.0 2.50 12 

Need-determined Assertiveness (Value-de- 
termined restraint) 10.9 12.7 1.6 20 
Equalitarian Ideology 15.6 13.0 4.9 .03 
Individual Orientation 15.9 13.6 4.9 .03 


TABLE 4 
COLLEGE GRADE AVERAGES OF PUBLIC AND 
PRIVATE ScHoou GRADUATES 


Number in Each 
Grade Category 


Group 
A B C 
Public (N = 88) 35 51 2 
Private (N = 77) 12 47 18 


x = 249; p < .001. 


example, that the value domains of Kluck- 
hohn and of Bales and Couch overlap; both 
formulations contain a dimension of “In- 
dividualism." The extent of overlap among 
the four value areas has not been deter- 
mined, but it seems reasonable, from the 
different formulations of the items and the 
different content in them, to assume that in 
general they do cover different aspects of 
the total possible value domain. The data 
presented in this study do not completely 
answer the question of the extent of value 
differences between publie and private 
school graduates but they do indicate that 
the differences extend beyond the areas 
covered by Kluckhohn's formulations and 
MceArthur’s data. 

A third question raised by MeArthur's 
findings was, are the differences in values 
between publie and private school gradu- 
ates a simple reflection of differenees be- 
tween the two groups in background vari- 
ables such as socioeconomic status and re- 


ligious orientation of their families? It was 
reported earlier that there were no statis- 
tically significant differences between iu 
two groups in terms of religious orientation 
of family and geographical origin; therefore, 
the differences cannot be attributed to these 
antecedents. Public and private schoo 
boys did differ, on the other hand, in terms 
of fathers’ occupation and family income: 
The difference in value between these tw? 
groups is not, however, a simple reflection 
of these differences in fathers' occupati 
and family income. When the total pori : 
was divided into new subsamples on tb: 
basis of, first, father's occupation and, [e 
ond, family income, the value difference 
described previously did not occur Tub 
the new subsamples, but new value diffe " 
ences arose. For example, there were e 
differences between students whose fam 2 
have an income of above $7000 a yea" p 38 
those whose families have an income a 
than $7000 in terms of the Bales and s 
factors Equalitarian Ideology 9" 
dividual Orientation as there had 
tween publie and private school gradu 
but there was a difference between the ter- 
groups on the dimension of Neod (ere 
mined Assertiveness. Similar results ^ gg 
found on the other value measures weiter 
must be concluded, then, that the chool 
ences between public and private s 
graduates is not to be explained as ® à 
reflection of differences between thes aris 
groups in terms of the backgroun' 


VALUES OF PUBLIC AND PRIVATE SCHOOL STUDENTS 217 


bles Religion, Geographical Origin, Fathers’ 
Occupation, and Family Income. It is not 
Possible, of course, with the data available, 
to answer the further question of whether 
the differential values are developed in the 
e or as a result of a more subtle se- 

ctive process operating in the family's 
ia of a public or private school for the 
A fourth question raised earlier was, do 
ives in values between public and 
s e School graduates persist through 
EM di experience in a liberal arts col- 
Man tis answer is yes. The Ss of this 
ies in the middle of their senior 
ty ^A lege, yet differences in values be- 
ge ei two groups were found which 
"d. es ar to the differences found be- 
Viti E 1i Such groups during a freshman 

We x a ege. Tt was reported earlier that 
um Y m no significant differences be- 
residene e two groups in terms of college 
s en extracurricular activities, and 
Partici i areas of concentration; the boys 
intellec, y in the same general social and 
Dlanation ~~ in college. An ex- 
in some 3 his persistence must be sought 

ore subtle influence. 


CONCLUSION 


thie * Teast four alternative explanations of 
e tlle enon may be suggested: (a) 
effect MER experience does not have much 
(5) di oras Student values (cf. Jacobs 1927); 
School ¢ ences at the end of the secondary 
of 4 o enig are so large that vestiges 
Year, o; remain after three and one-half 
there in common college experiences; (c) 
With; E Informal selective social systems 
to Maint College environment which tend 
egg. ks am and enhance cultural differ- 
LM di (d) a combination of the previous 
Mation ue may be operating. Infor- 
Mong "Ich would enable one to choose 
abla, Nese alternatives is not now avail- 


An i 

nte: á 

the anite tation of these differences on 

Choo] boy level is not difficult. The public 
8 are characterized by stronger 


orientation in the directions of Doing, Man- 
over-nature, Individualism, Equalitarian- 
ism, and Achievement—almost a listing of 
typical or dominant American values. The 
private school boys, on the other hand, re- 
flect weaker orientations in these directions 
but reflect no clear-cut alternative values. 
But, are these existing differences the sur- 
viving vestiges of a disappearing class 
differentiation, or do they hail the subse- 
quent decline of the traditional American 
value system and the emergence of a new 
ideal with the Eastern preparatory school 
graduates as an avant garde (cf. Riesman, 
1950)? For example, the occupational 
values of the private school boys would 
seem to be partieularly suited to the or- 
ganizational executive role, which is a 
relatively new development on the occupa- 
tional scene, while the values of the public 
school boys seem more consonant with the 
traditional professional roles. And, indeed, 
the occupational choices of 930 Harvard 
seniors support this hypothesis (data col- 
lected in the larger study): the private 
school graduates tend to choose bureau- 
cratic business, and the publie school boys 
tend to choose science, medicine, and col- 
lege teaching; the quantitative data on law 
is ambiguous, but a closer inspection of the 
data shows that the private school boys 
tend to think of law as a preparation for an 
executive role in business or other organiza- 
tions, and the publie school boys are in- 
terested in the practice of law itself (chi 
square with two degrees of freedom equals 
24.9, p less than .001). 

A question naturally arises concerning 
the generality of these findings beyond the 
population from which the sample was 
drawn. Any comments, until empirical 
data are available, must be purely specu- 
lative. On the basis of the fact that the 
value differences were not a simple reflec- 
tion of differences in certain background 
characteristics, other than type of second- 
ary school attended, the working hypothesis 
must be that these differences between 
publie and private secondary school gradu- 


218 


ates will also be found in other populations. 
On the other hand, the private school is not 
as widespread a phenomenon in the South, 
Midwest, and West as it is in the North- 
east, and it may well be the case that the 
private schools in these other parts of the 
country serve different motives and func- 
tions. In such a case the working hypothesis 
would have to be revised. 


SUMMARY 


Responses to a set of value items by 88 
public school graduates and 77 private 
school graduates who were seniors in a 
large liberal arts college were compared. 
'The public school boys were found to be 
more Doing, Man-over-nature, Individu- 
alistic, Equalitarian| and Achievement 
oriented; more public school boys valued 
the opportunity to exercise a particular 
competence, to make a contribution to 
society, and to add to the accumulating 
body of knowledge and culture; and they 
valued less the opportunity to make deci- 
sions and give orders, to assume responsi- 
bility, and to be in close contact with 
others. These differences were not a simple 


W. CODY WILSON 


reflection of antecedent background char- 
acteristics, such as religious orientation or 
socioeconomic status, but they are similar 
to differences found between other groups 
of public and private school graduates 3$ 
college freshmen. The implication of these 
differences for a changing American char- 
acter are discussed. 


REFERENCES 1 
Bares, R. F., & Coucn, A. C. Factor ae 
ysis of the domain of values in the Va bh 
Profile Test. Unpublished manuscrip': 
Soc. Relat. Dep., Harvard Univer» 
1956. "M". 
Centers, R. The psychology of social clas en 
Princeton: Princeton Univer. Pres? 
1949. in. college. 
Jacon, P. E., Changing values in CO 
New York: Harper, 1957. and 
KruckHoHN, FLoRENCE R. Dominant ope 
variant value orientations. In C. K! k der 
hohn, H. A. Murray, & D. M. Bone e 
(Eds.), Personality in nature, SES 
and culture. New York: Knopf, 19 be- 
McArtuour, C. Personality differences "7 
tween middle and upper closes o. 
abnorm. soc. Psychol., 1955, 90, 24 yen: 
Riesman, D. The lonely crowd. New Ha 
Yale Univer. Press, 1950. 


Received February 9, 1959. 


"T. 


JounNAL or Ep 
UCATIONAL P8YCHOLI 
Vol. 50, No. 5, 1959 in 


SOME LIMITATIONS OF TEACHER RATINGS AS 
PREDICTORS OF CREATIVITY: 


JOHN L. HOLLAND 


National Merit Scholarship Corporation, Evanston, Illinois 


VN assessment of students by means of 
which * ratings is an extensive practice 
evaluat as a number of influences: These 
i no may affect the student's going 
a bou obtaining a scholarship, having 
rt self-concept, or feeling that he is 

is an high achievement and creativity. 
more ad 18 important that we acquire a 
$0 that c knowledge of these ratings 
for both us can be made more valuable 
Spends va, € student and the teacher who 
tions, aluable time making these evalua- 


hi, s est ratings, teacher ratings ex- 
vali dities, s effect and have ambiguous 
; ary ince technical efforts to develop 
bora scales and to train raters have 
aPproac 4 generally successful, another 
Suggested Lue in order. Ryan (1958) has 
the ed at it may be wiser to explore 
Diriea] ed of ratings through their em- 
oma ates, using & wide range of 
ures a than to hope “for literal 
jective 's traits.” In other words, the 
Ded st not to obtain ratings of well- 
the iot udent behaviors, but to explore 
to y E Which influence the teacher 
is S rede favorably or unfavorably. 

i ings P Bo, moe such exploration. 
i acherg "wa school seniors made by their 
aded With a Principals have been corre- 
tua vement Anety of personality and 
ange’ student oe scholastic apti- 
info, VOCationa] reports of their activities 
mation interests, and demographic 


1 Thi 

the 72s g 
Olq Nation gy Was partially supported by 
adep; Minio cience Foundation and the 
La, Ote to n Foundation. The author is 
of thi Kent pale. T,. Thistlethwaite and 
ÎS pape, 9" their constructive reviews 


PROCEDURE 


The students rated in this study (783 
boys and 394 girls) are 88% of a one-sixth 
random sample drawn from 7500 students 
who were the survivors (finalists) in the 
1958 National Merit Scholarship program. 
The Sixteen Personality Factor Question- 
naire (16 P.F.), the National Merit Stu- 
dent Survey (NMSS), and the Vocational 
Preference Inventory (VPI) were adminis- 
tered by mail to this sample. 

Form A of Cattell’s 16 P.F. test was 
used. This inventory is well known and has 
been described in a number of publications 
(Cattell, 1957; Cattell, Saunders, & Stice, 
1957). The NMSS is an experimental 
achievement inventory devised by the Na- 
tional Merit staff from a review of the liter- 
ature. It consists of 10 internally consistent 
scalesassumed to measure some of the more 
important personality and  attitudinal 
variables which are related to academic 
achievement. These include: Dedication to 
Scholarship, Dependency, Dominance, 
Play, Intellectualism, Introversion, Par- 
ental Press, Persistence, Super Ego, and 
Tolerance for Ambiguity. The VPI, an 
experimental personality inventory com- 
posed of occupational titles, is a revision of 
the Holland Vocational Preference Inven- 
tory which has been described elsewhere 
(Holland, 1958). The VPI consists of the 
following scales? Acquiescence, Infre- 
quency, Physical Activity, Intellectuality, 
Responsibility, Conformity, Verbal Ac- 
tivity, Emotionality, Aggressiveness, Con- 
trol, Masculinity, and Status. In addition 
to this test data, the students listed their 
activities, interests, and various back- 


ground data. 
Teachers and principals filled out part 


219 


220 


of an extensive information blank including 
12 graphic rating scales. The rating scales, 
which covered emotional, intellectual, and 
physical traits and aptitudes, were divided 
into six intervals: Among best .01, .05, .10, 
25, Average, Below Average, or No Ob- 
servation. School personnel were instructed 
to "rate this student relative to the high 
school seniors you have known during the 
past five years." In general, the ratings were 
made by teachers, principals, or guidance 
workers, but there is no evidence to indi- 
cate which of these groups did most of the 
ratings or how much they collaborated in 
making evaluations. 


RESULTS 


The rating scales were intercorrelated to 
test their independence. The average inter- 
correlation among the 12 scales is .64 and 
.59 for the male and female samples re- 
spectively. The average intercorrelation 
for each of the 12 scales against the 11 re- 
maining scales ranges from .60 to .70 for 
males, and from .51 to .65 for females. It is 
clear that the ratings are closely related 
despite the diversity of the personal quali- 
ties rated. Table 1 shows the intercorrela- 
tions among the rating scales. 

In view of the high intercorrelations 
(which suggest that there is a strong halo 
effect), the variable with the highest aver- 
age intercorrelation for both boys and girls 
was selected as the rating most representa- 
tive of the 12 ratings. This rating, Maturity, 
can be regarded as a measure of the degree 
to which students are rated “high” or 
“Jow,” and perhaps held in “high” or “ow” 

esteem by school personnel. 

High versus low ratings of Maturity 
were then correlated with student scores 
for each of the following inventories, 

scholastic aptitudes, and background vari- 
ables: Sixteen Personality Factor Question- 
naire, Vocational Preference Inventory, 
National Merit Student Survey, high school 
rank, father's and mother's educational 
level, number of student offices (elected), 
and Scholastic Aptitude Test. The corre- 


JOHN L. HOLLAND 


lations, which were computed using the 
Davidoff and Goheen method for estimat- 
ing tetrachoric correlations, are shown M 
Table 2. Twenty-four per cent of the cor- 
relations in Table 2 are significant beyond 
the .05 level so that the results probably 
cannot be attributed to chance. 

To facilitate comprehension of Table 2, 
the student characteristics are ordered be- 
low according to their absolute correla- 
tions with Maturity. All student character 
istics which have at least one Bigoiion 
correlation with the criterion are included. 
The first correlation in parentheses refers 
to the male sample; the second refers to the 
female sample. These correlations aA 
probably attenuated due to restriction 
range: the average aptitude level and ee 
school rank for these samples are about i 
standard deviations above the natione 
norms; the samples are also restricted Eur 
variety of background variables, indue 
parents’ education, family income, etc. 


No. Variable 


14) 
7 Shy are (ii, 


Variable No. 


39 High School Rank 


(.35, .35) 43 SAT-Verba 
24 Persistence (.19, 16) 12 
34) 16 Tense (~-t 
42 Elective offices —.14) (06, 
(.19, .23) 6 Persistent \ 
15 Control (.10, .26) 18) (.00; 
20 Play (—.15,—.20) 19 a 
21 
25 Super-ego (18, — 28 Infrequeny 
.13) Ca een 
12 Insecure (—.17, — 14 Self-sufficie? 
—.12) (12; 05g 
21 Intellectualism 11 Sophisto 
(14, .15) (—.12; : - 
e 
This summary suggests that ih by 


ud : n À raeterize 
receiving high ratings is characte ec" 
g g 


his high grades, persistence, frequen 5 ee 
tion to student offices, serious m pollet” 
sponsibility, feeling of security, r gpude 
tuality, sociability, high verbal 3P and 
freedom from tension, self-sufficieni ] i 
lack of sophistication. In p» poo 
boy rated low is characterized A lection 
grades, erratic effort, less frequen 


nd 
; undepe" 
to student offices, playfulness, uF 


RATINGS AS PREDICTORS OF CREATIVITY 


TABLE 1 
Tue CORRELATIONAL Matrices ror TWELVE TEACHER RATINGS 


221 


FoR 783 Boys AND 394 GIRLS 

Trait 428345 6 7 S8 9 Wt 12 « 
1. Stability 90 47 64 75 47 44 75 64 62 60 54 62 
2. Maturity 89 58 65 80 56 45 80 59 62 58 65 65 
3. Originality 65 65 62 58 72 84 58 54 50 04 62 61 
4. Drive to achieve G4 64 63 77 52 55 72 48 49 73 67 62 
5. Dependability 79 75 54 74 45 47 87 57 48 52 57 62 
8. Speaking skills 61 64 67 66 54 50 44 67 68 47 47 54 
7. Writing skills 62 62 79 62 57 82 46 37 42 62 47 51 
8. Citizenship 72 72 48 60 82 49 55 66 56 54 61 64 
s Popularity 72 72 60 45 60 56 57 61 89 49 53 58 
m. Social leadership S2 80 60 46 60 67 61 59 88 48 45 56 
Hn. Intellectual leader 64 60 67 76 68 54 68 67 56 53 70 58 
"d Physical vigor 61 61 54 66 71 50 70 61 51 51 66 57 

70 69 62 62 68 61 65 62 62 64 64 60 


int 


TABLE 2 


Note.—Correlations for girls are above the diagonal; correlations for boys are below the diagonal. F is the average 
rcorrelation for each variable against the other 11 variables. 


Tre RELATION or TEACHER RATINGS (MATURITY) TO STUDENT PERSONALITY, ACHIEVE- 


MENT, AND BACKGROUND VARIABLES 


Variable Boys | Girls Variable Boys | Girls 
16 Pp 
A Sociable —10 | 09 |24. Persistence 19e) Spe 
3, c Intelligent 09 | 13 |25. Super Ego m 18**| 13 
4 E ature 05 03 | 26. Tolerance for Ambiguity 09 03 
5. y Dominant —03 | 03 | EPI. 
6. g Cheerful —09 |—05 || 27. Acquiescence -0 | 00, 
7. p Persistent 06 | 18*|28. Infrequency -0 | 17 
8. I Adventurous —14* | 14 | 29. Physical Activity 02 06 
9, r, wleminate 09 | 13 | 30. Intellectuality 06 |—01 
10, yp j?'anoid —09 | 02 |31. Responsibility 04 |-02 
ll y ghtroverted 03 | 05 || 32. Conformity 04 | 00 
12. g Shrewd —12* | 04 | 33. Verbal Activity o |-02 
13; Q, secure —17**|—12 | 34. Emotionality 02 |-01 
14; Gt Radical —04 |—18 | 35. Aggressiveness 05 |—04 
15, 2 Self-Sufficient 12* |—05 | 36. Control 02 13 
16; ° Controlled 10 | 26**| 37. Masculinity -09 | 02 
Nira Tense —12* |—14 | 38. Status ot | 12 
w. S Misc. Variables all shee 
1g, po larship —os | 01 |39. High School Rank (Grades) | 35**| 35 
19. ePendeney 00. 09 | 40. Fathers Education 03 04 
20, Play nance 00 | 21**| 41. Mothers Education -02 o 
21. I oy —15**|—20* || 42. Elective Offices 19” 2 
22. pitellectualism 14* | 15 | 43. SAT-Verbal 1*| 16 
23. p!toversion 09 | 03 | 44. SAT-Math 0 | 12 

arental Press —03 02 


s+ _ 95 level of confidence. 


~~ ‘Ol level of confidence. 


Note 
* =o, 9r boys, 7.05 = 11, 7,01 15. For girls, r 


„os = .16, r.i = .21. 


222 


bility, insecurity, nonintellectual interests, 
unsociability, low verbal aptitude, tense- 
ness, dependence, and sophistication. The 
characteristies of girls rated high are much 
like those found for boys; 12 of the 16 
correlates have similar relationships with 
ratings for both sexes. 

The school and community activities of 
the student samples were also related to 
teacher ratings by categorizing student ac- 
tivities and teacher ratings in 2 X k tables 
and testing their significance. For boys, a 
number of school and community activities 
are significantly associated with ratings. 
Boys with high ratings say that their most 
important extracurricular interests are in 
student government and community ser- 
vice. Boys receiving low ratings are less 
active in these areas and more interested 
in athletics. The results for girls are not 
statistically significant. 

In short, the student rated high by his 
teachers appears to be a bright, persistent, 
conscientious academic achiever and stu- 
dent leader. His personal adjustment is 
characterized by self-control, sense of se- 
curity, and freedom from anxiety. 


Discussion 


In a similar study, Tallent (1956) finds 
that high teacher ratings of self-control for 
secondary school boys are Positively cor- 
related with intelligence test scores. Tallent 
also suggests that a rating bias may favor 
students who, according to the subscales of 
the control rating schedule, are distin- 
guished by “ability to persevere at a task, 
carefulness and accuracy of work, tendency 
to think before acting, and . . . preference 
for serious conversation or study to sports 
or active games." This interpretation ap- 
pears congruent with the present results 
which show that high ratings are associated 
with persistence, self-control, and academic 
interests. 

The empirical correlates of these teacher 
ratings are particularly interesting when 
they are examined in relation to our present 


JOHN L. HOLLAND 


knowledge of achievement and creativity. 
In a series of researches based on Cattell’s 
16 P.F., Cattell (1955), Drevdahl (1956), 
and Drevdahl & Cattell (1958) compared 
creative and noncreative people in samples 
of college students, teachers, adminis- 
trators, scientists, artists, and writers. Their 
findings, which are very consistent for these 
diverse groups, characterize the creative 
person as intelligent, emotionally mature, 
dominant, adventurous, emotionally sensi- 
tive (feminine), introverted, radical, selí- 
sufficient, tense, unsociable, depressive 
less subject to group standards, and im- 
pulsive. Although five or six of these 13 
Scale differences appear to be consistent 
with the charaeterization of the student 
rated high by teachers in the present study, ’ 
most of Cattell and Drevdahl’s findings 
describe a person who is in many respects 
the opposite of the person teachers seem 
to prefer. In Cattell's words: “. . . these aTe 
not characteristics of a pleasant personality; 
differing markedly from those shown for tbe 
successful salesman or the elected, popular 
leader..." Similar evidence for this “ae 
terpretation has been obtained by ee 
and Jackson (1958) in a study of “intel T 
gent" versus “creative” high school a 
dents. They find that creative students t 
characterized by their use of stimulus dr 
humorous, and playful themes in the 4 
fantasy productions. Moreover, these E, 
vestigators also report that teachers 9 
ferred the intelligent to the creative 9 ü 
dents, although both groups are about eq" 
in school performance (grades). $ 
In an extensive series of researches y: 
effective work performance and onsen 
Barron (1957), Woodworth (1958), s 
others describe the creative person in 2 
which seem to support some of the C? im 
and Drevdahl findings, especially te ge 
pulsive, radical, and dominant QU? 
attributed to creative persons. + tbe 
"These differences suggest, then; the and 
use of teacher ratings is an ifs i 
perhaps inadequate method for 5€ 


RATINGS AS PREDICTORS OF CREATIVITY 


Potentially creative persons. Although high 
teacher ratings have some correlates which 
are similar to a number of the characteris- 
ties Associated with creativity, such as 
Persistence, high academic aptitudes, and 
Perhaps dominance, most of the correlates 
are more indicative of potential for leader- 
ship or academic achievement rather than 
creativity. 

Practically, these results suggest that 
colleges, scholarship sponsors, and other 
organizations interested in the selection of 
oe creative persons should use 
= er ratings in a selective fashion. For 
a Mple, they might be used as estimates 

E eene, dominanee, and high aca- 
Sisi aptitudes in combination with se- 
e test variables predictive of creative 
Qm Only a limited reliance on teacher 
m predictors of creativity appears 
Prefer €, however, since teachers seem to 
bou ome whose potential for creative 
ie ck as measured by tests at least, is 
an that of students rated low. 


Summary 


Es Significance of teacher ratings has 
ive Plored by correlating a representa- 
Variet, teacher rating, Maturity, with a 
back, ^ of personality, achievement, and 
sampl ound variables for a large national 
as of high ability high school seniors. 
tom ee with reference to findings 
teacher er studies, the results suggest that 

a ae are potentially more useful 

“Cictors of academic achievement and 


en c ; 
"reati potential than as predictors of 


e 


228 


REFERENCES 

Barron, F. Originality in relation to per- 
sonality and intellect. J. Pers., 1957, 25, 
730-742. 

CarrELL, R. B. Personality and motivation 
structure and measurement. New York: 
World Book, 1957. 

CATTELL, R. B., & Drevpant, J. E. A com- 
parison of the personality profile (16 
P.F.) of eminent researchers with that 
of eminent teachers and administrators, 
and of the general population. Brit. J. 
Psychol., 1955, 46, 248-261. 

CaTTELL, R. B., Saunpers, D. R., & STICE, 
G. Handbook for the sixteen personality 
questionnaire. Champaign, Ill.: Insti- 
tute for Personality and Ability Test- 
ing, 1957. 

Dnzevpanr, J. E. Factors of importance for 
creativity. J. clin. Psychol., 1956, 12, 
21-26. 

Dnrvpanur, J. E., & CamrrELL, R. B. Per- 
sonality and creativity in artists and 
writers. J. clin. Psychol., 1958, 14, 107- 
11. 

Gerzers, J. W., & Jackson, P. W. The 
highly creative and the highly intelli- 
gent adolescent: An attempt at differen- 
tiation. Amer. Psychologist, 1958, 13, 
336. (Abstract). 

HorraNp, J. L. A personality inventory 
employing occupational titles. J. appl. 
Psychol., 1958, 42, 336-342. 

Ryan, F. J. Trait ratings of high school 
students by teachers. J. educ. Psychol., 
1958, 49, 124-128. 4 

TarLENT, N. Behavioral control and intel- 
lectual achievement of secondary school 
boys. J. educ. Psychol., 1956 47, 490- 
503. 

Woopwonru, D. G. A factorial study of 
trait-rankings used in an assessment of 
professional research scientists. Paper 
read at Western Psychological Associ- 
ation, Monterey, California, 1958. 


Received February 24, 1959. 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 5, 1959 


A STUDY OF WORDS INDICATING FREQUENCY 


DAVID R. STONE anp RICHARD T. JOHNSON 


Utah State University 


In past years attempts have been made 
to find dimensions of meaning in the field of 
semantics and psycholinguisties. Studies 
have been somewhat sporadic, but recent 
developments indieate a broadening in- 
terest. In an article on the psychophysics of 
semantics, Jones and Thurstone (1955) ap- 
plied the method of successive intervals to 
describe a continuum of meaning from 
“greatest dislike" to “greatest like." Mosier 
(1941) used the same method to establish a 
continuum of meaning through a “favora- 
ble-neutral-unfavorable" series. Cliff (1959) 
also used this procedure with 9 adverbs and 
15 adjectives in determining a series of 
scale values from “extremely nice" to 
“decidedly bad." Other work in this area 
may be found in Simpson (1944), who at- 
tempted to organize frequency words using 
percentages, Osgood, Suci, and Tannen- 
baum (1957), who have measured meaning 
change with a “semantic differential,” and 
Cohen, Dearnley, and Hansel (1958) in 
England. General background for the en- 
tire field appears in Miller’s (1954) article 
on psycholinguistics. 

The study reported here uses the suc- 
cessive interval approach as outlined by 
Edwards (1952, 1957) with frequency words 
varying from “none of the time" to “all of 
the time." 

Frequency itself is a rather small di- 
mension of meaning, but commonly used. 
The abundance of such terms in educational 
measurement and the consequent confusion 
as to their meanings points out the need for 
study in this area. For example, to decide 
whether an educational practice should be 
advocated “usually” or “occasionally” can 

be of critical importance. Analysis of the 
function of such words in test questions in 
educational measurement seems to have 


been neglected. 


PROCEDURE 


Over a period of years, 48 terms having 
at least an element of frequency meaning 
were collected from test questions supplied 
for courses in educational psychology. The 
list included words which have implications 
concerning “certainty” and “time” aS 
well as frequency. Words appear in this 
fashion in actual questions. For example, 
“was” may be used with a time implication 
as well as a frequency implication. 

Instructions to the subjects were as 
lows: 


fol- 


In the word list on the next page are yordi 
and phrases which are used to refer to 
quency of occurrence. For each wor “a 
phrase, make a check mark to show W a 
frequency the word or phrase would wm 
to you if you read it in a book; that 15, an 
many times you think the word implies 
indicates. vint 

Example: Suppose you read that an € Mem 
happens "sometimes" or a “few bap 
Mark the appropriate place for each a 
on one of the blanks on the line opP° 
the word. 


None Half a 
Words a or m " 
time time 
Sometimes oe S5 
Few Times i H 


ex^ 

On the next page are words used 1 pre- 
press judgments of how often or ho rang? 
quently something can occur. They tme’ 
from “none of the time" to “all of the of the 
with “half of the time" in the center 7 
scale. yor 

Read through the entire scale befor lys 
begin to mark your choices. Work ¢™ ord 
but do not study each word too long- ^ oF 
your first impression of the meaning wor 
would interpret it if you read it- ied 
seems to have more than one plus 
choose the meaning most importa? 


224 


WORDS INDICATING FREQUENCY 225 


d If you feel it is impossible to rate the 
b eed of a particular word, leave it 
ntil you have done the other words, then 


mis back and make the best estimate you 


Mark one space for each word. 


The subjects (Ss) were 158 students in 
a and educational psychology, in- 
faeit freshmen through seniors, regis- 
es r^ the fall of 1958. The procedure gave 
"on he full list of terms and suggested 

they carefully read it through before 
marking any item, 
Nx rà of "anchor" words was dis- 
tater y 7 Bendig (1955), to the effect that 
isteron, lability increases as a function of 
ids cae of the stimuli. The anchor 
time” s nrases) used here, “none of the 
very } o “all of the time” are, of course, 
also rai a À center phrase was 
stimuli Hie The merit of presenting all 
Separate] ogether, as was done here, or 
ie: ely for these kinds of stimuli has not 
ettled. 
each, it asked to use a single choice for 
Was dese and not to omit any item. This 
Some m Since it was assumed that while 
Without me were inexact they were not 
meaning. 
Were hae no matter how deviate, 
class e ed. The intent was to parallel a 
Careless ation which would include some 
“rare” ness, but. also some occasional 
es, it wterpretations. Divergent respon- 
à very ix discovered, sometimes have 
Xample [um meaning to the student. For 
“hever; ne vay few subjects who marked 
ineludeg in the “all the time" category were 
Sion 4 a E the study, since in class discus- 
B a Way ent had said that to him “never 
Su É iy never, so I marked it always." 
tionnaires on took place after the ques- 
The via ad been collected. 
Rive, b *thod of successive intervals as 
Dlieg "y Edwards (1952, 1957) was ap- 
si the ü the cumulative choices for cach 
dispersion Scale values and discriminal 
Was a Bo ris Were then ealeulated. Each item 
Der ag à D. otted on normal probability pa- 
heck for the calculations. 


The items on the questionnaire were pre- 
sented in alphabetical order. They are pre- 
sented here in order of magnitude along 
the continuum from “never” to “always.” 


RESULTS AND DISCUSSION 


The scale values and standard deviations 
for the items are given in Table 1, with 
starred values referring to one possible 
selection of terms for a nine-step con- 
tinuum. 

More often than not, the plotted scores 
were linear, thus supporting the general 
hypothesis of this method. Three kinds of 
variations in the data were noted: (a) words 
having extreme frequency meanings showed 
relatively higher dispersions, (b) some words 
displayed skewness, and (c) some displayed 
bimodality. Phrases at the ends of the scale 
showing relatively high dispersions were 
“always,” “cannot”, “hardly ever," “does 
not” and “is always.” Phrases showing 
some skewness were “more often than 
not,” “now and then,” “requires,” “sel- 
dom,” “sometimes,” and “unpredictable.” 
Bimodal terms were, “a possibility exists,” 
“are,” “can,” and “will.” 

The average error or discrepancy score 
for the data was .06. This was determined 
by calculating the average difference be- 
tween the theoretical cumulative propor- 
tions and the actual empirical cumulative 
portions for the items. This error of .06 is 
somewhat larger than the .02 to .03 usually 
reported. We assume that the broader 
scope given in the directions, and the de- 
liberate inclusion of words with frequency 
plus other meaning dimensions, accounts 
for this. 

The larger dispersion of words or phrases 
at the extremes is related to the methodol- 
ogy. Except for “more often than not,” 
and “requires,” skewness seems to be as- 
sociated with an arbitrary limit below the 
“as often as not,” the center category. Bi- 
modality indicates two meanings for a 
term. The instructions possibly limited this 
trend from appearing more, since the Ss 


226 DAVID R. STONE AND RICHARD T. JOHNSON 
TABLE 1 
SCALE VALUES AND STANDARD DEVIATIONS FOR ALL STIMULUS ITEMS 
Item Sele | SD Item Sale | SP 
*]. never .03 .90 | 23. may 2.94 -14 
2. was never -15 | 1.12 | 26. was 3.13 | 1.58 
3. cannot -17 | 1.80 | 27. appears to be 3.18 7 
4, does not 24 | 1.70 | 28. should 3.35 | 8 
5. hardly ever .81 2.42 | *29. more often than not | 3.40 .6 
*6. almost never .85 .50 30. requires 3.41 1.03 
7. hardly -91 | 1.32 | 31. probably 3.56 87 
8. very seldom 96 | .50 | 32. can 3.65 | 1.07 
9. seldom 1.12 1.20 33. characteristically 3.72 .82 
10. not often 1.32 -51 || 34. rather often 3.73 <5 
*11. infrequently 1.34 -60 | 35. considerably 3.75 63 
12. a possibility exists 1.57 | 1.31 | 36. frequently 3.78 | 1.08 
13. few 1.74 -90 | *37. usually 3.83 j^ 
14. now and then 1.82 | 1.08 | 38. generally 3.86 » 
15. once in a while 1.86 .85 39. often 3.91 E 
*16. occasionally 2.11 -70 | 40. many 3.97 E 
17. perhaps 2.12 | .92 | 4l. was usually 4.06 E 
18. possibly 2.21 -79 | 42. very often 4.35 i 
19. sometimes 2.46 -99 || *43. practically always 4.40 “92 
20. has 2.51 | 1.34 | 44. is 4.84 | 1.2 
21. unpredictable 2.71 | .60 | 45. will 4.86 | 1-26 
22. is sometimes 2.78 .64 46. are 4.94 1:4 
*23. as often as not 2.87 -61 || 47. is always | 5.19 La 
*24. 50-50 2.87 | 41 | *48. always 5.22 | 2-9 


Note 


were asked to choose only the most com- 
mon meaning of an item. 

The most important part of this study is 
the suggestion of the importance of fre- 
quency words as an area for further study. 
Obviously, frequency words are not in- 
tended to convey percentage of occurrence 
in our language. The principal value of work 
on frequency would seem to be in establish- 
ing more consistent relative meanings 
rather than a series of absolute meanings. 

Another consideration is the fact that 

frequency meaning occurs in the same word 
with other dimensions of meaning, such as 
time and uncertainty. This is simply a 
characteristic of the language, but can be 
minimized by careful selection of words. 
The use of forms of the word “is,” where 
frequency is unstable and often neglected 
or unstated in test questions, poses a par- 
ticular problem. 


.—Starred values refer to one possible selection of terms for nine-step continuum. 


Three kinds of application are suggest 
First, a list of frequency words with i for 
semantic variations could be compile jon 
reference. As an example, class dirum o 
revealed that, in some cases, the be ” 
associated ideas determines that “™°"? 
can mean “few.” Thus, when three p? 
are lost for an adult, it is “few,” but : 
child, it would be “many.” Some mea” g 
depend on context and vocal empb25 or 
in “can” and “will.” Second, the n er 
such classification could be shown bY maple 
ence to actual test questions. For eX? oid 
in the True-False question, (Ghani n of 
IQ over a long term reflect an e rand 
measurement rather than an actua der D. 
of mental development,” if the Te" ine 
sumes the frequency “sometimes | “ok 
question is true, but if he assur p at 
ways,” it is false. In the question, ance 2: 
curate estimate of typical perform 


e 
" 


WORDS INDICATING FREQUENCY 


obtained by observing the pupil in a typical 
Situation," the student wonders whether 
1$" means “is always," or “is sometimes," 
or some other unstated frequency. For the 
question, *A pupil's motivation may be 
80 strong as to interfere with good mental 
test performance,” the word “may” neces- 
Sitates a different frequency assumption 
than the unstated frequency of “is” in the 
Previous question. Third, a logical analysis 
of intervals could be used as another means 
of clarifying and developing a frequency 
Continuum, which would be more than 
4 Simple scaling of how certain Ss now 
spond to the terms. One might begin a 
i nn of finding logical differences in 
pod by taking “never” as in the first 
p erval. Then in a “mutually exclusive” 
cwn See if another term can share the 
tur ih or whether a new one must be 
e to accommodate the next logical di- 
m ion, Note that when “more often than 
z d introduced into a situation which 
im on Previously restricted to “always,” 
of a “many,” and “few,” the quality 
Useful discussion changes. This forms a 
"n. asis for later discussion of statistical 
cepts of significance. 


SUMMARY 


S Series of 48 frequency words or phrases 
Selected from classroom tests. A group 
Seale, ;rubiects as directed to rate them on a 
‘tom “none of the time,” to “all of 
€ time,” 
er. values and dispersions were calcu- 
fi 9r each item, and they were arranged 
Continuum of successive intervals. 


227 


From the results of this study it is possi- 
ble to select items for use in classroom test 
questions which can more consistently 
represent relative meaning for concepts of 
frequency. 


REFERENCES 


Benpie, A. W. Rater reliability and the 
heterogeneity of the scale anchors. J. 
appl. Psychol., 1955, 39, 37-39. 

Curr, N. Adverbs as multipliers. Psy- 
chol. rev., 1959, 66, 27-44. 

Conen, J. A., DEARnNLEY, E. J., & HANSEL, 
C. E. M. A quantitative study of mean- 
ing. Brit. J. educ. Psychol., 1958, 28, 
141-148. 

Epwarps, A. L. The scaling of stimuli by 
the method of successive intervals. J. 
appl. Psychol., 1952, 36, 118-122. 

Epwanps, A. L. Techniques of attitude scale 
construction. New York: Appleton- 
Century-Crofts, 1957. 

Jones, L. V., & Tuursronz, L. L. The 
psychophysies of semantics: An experi- 
mental investigation. J. appl. Psychol., 
1955, 39, 31-36. 

Miter, B. A. Psycholinguisties. In G. 
Lindzey (Ed.), Handbook of social 
psychology. Cambridge, Mass.: Addison- 
Wesley, 1954. Pp. 193-708. 

Moser, C. I. A psychometric study of 
meaning. J. soc. Psychol., 1941, 13, 123- 
140. 

Osaoop, C. E., Sucr, G. J., & TANNEN- 
BAUM, P. H. The measurement of mean- 
ing. Urbana: Univer. Illinois Press, 
1957. 

Smrson, R. H. The specific meanings of 
certain terms indicating differing de- 
grees of frequency. Quart. J. Speech, 
1944, 30, 328-330. 


Received March 7 1959. 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 5, 1959 


A NOTE ON SOME ABILITY CORRELATES OF THE RAVEN 
PROGRESSIVE MATRICES (1947) IN THE KINDERGARTEN 


DALE B. HARRIS 


Pennsylvania State University 


Burke’s comprehensive review (1958) of 
the literature on the Raven test shows sev- 
eral studies relating the measure to the 
Stanford-Binet, the Wechsler-Bellevue, the 
WISC, and an occasional group test. Vir- 
tually no studies have appeared to examine 
the test’s relationship to specific abilities 
other than those measured by Wechsler 
subtests, or to assess its usefulness with 
young children. 

The review also casts some doubt on the 
assertion that the Raven is virtually a pure 
measure of g. Also, correlations with the 
Binet have generally been lower than that 
reported in the test manual (r = .65), but 
correlations with performance measures 
tend to be slightly higher than with verbal 
measures. 

The subjects of this study! were 98 
kindergarten children, 45 boys and 53 girls 
aged 5-1 to 6-1, selected to represent the 
urban population of the United States by 
parental occupation distributed on the 
Minnesota Scale for Parental Occupations, 
These children were given individually the 
Raven Progressive Matrices (1947), the 
SRA Primary Abilities Test, and the Good- 
enough Draw-a-Man Test by three trained 
and experienced examiners. All testing was 
accomplished within the period of one 
month. 

Means and standard deviations for the 
several tests appear in Table 1. Reference 
to test norms shows that this group is very 
close to typical performance for its age 
(5-6) on all measures. It is at the 6-year 
level on Raven’s norms. Table 2 presents 

the product-moment intercorrelations of 
the measures. Correlations calculated sepa- 

1 Appreciation is due Nolan Kearney, 


i . Paul 
i t Superintendent of the St 4 
Monon rice, who arranged for the 


testing. 


rately by sex showed no marked and cer- 
tainly no consistent patterns of differences 
and are omitted. k 
In the present study, there is only a we& 
relationship to any of Thurstone’s (1953) 
factors measured in kindergarten children. 
Indeed, the Thurstone factors intercorrelate 
for the most part more substantially than 
does the Raven with any of them. Refer- 
ence to the three available studies relating 
the Raven to the WISC shows marked dis 
crepancies in results among demote 
school children. There is some tendency +0 
the Raven to correlate with ek 
Block Design, and possibly Lie a ae 
more substantially than with Compre 4 
sion, Coding, Mazes, and Object Assem e 
The odd-even split-half reliability (c 5E 
rected) of the Raven with kindergartn i 
in the present study is .466. This low kr : 
would certainly attenuate the relations? p 
observed in Table 2, it 
In preliminary work for this gun 
became apparent that for kindergar on's 
some of the British expressions in Rav 
manual had to be modified slightly. 
changes, consistently applied, bro"! ad 
quicker understanding of the task. 2 pr 
dition, the question, “Is that the righ ai 
to go in here,” used to direct attention 1 
insure a more careful examination © self- 
materials, actually seems to shake rsis 
confidence, an effect which then p j 
throughout the measure. Probably t 
rections need more extensive enc 
The test appears to be graduated i m 
ficulty and there was consequently». 
Siderable waning of interest and we 
asm, especially in the B series. At weve 
test proved difficult for five- to anoh 
olds, especially those in the average 
of ability and below. 


guch 
gbt 


228 


CORRELATES OF THE RAVEN PROGRESSIVE MATRICES 


229 


TABLE 1 


Means 
NS AND STANDARD DEVIATIONS OF SCORES ON THE RAVEN, THE THURSTONE PMA, 
AND THE Draw-a-Man Tests, KINDERGARTEN CHILDREN 


Boys (N = 45) Girls (V = 53) Total (N = 98) 
X SD X SD X SD 
Raven 

à 16.24 3.0 .25 4 

Hity Mental Abilities: anl Bud dt 
rbal Meanin 27.84 7 

P g 27. 7.9 30.34 7.0 30.11 7.4 
PORNE Speed 9.36 6.4 11.04 6.9 10.27 6.6 
a ne 11.44 6.0 11.11 6.2 11.27 6.1 
ds 21.56 8.8 25.40 10.7 23.63 10.1 
ME 10.53 5.3 12.45 4.9 11.57 5.2 
12.69 S:t 15.50 6.4 14.21 6.3 


Short]; 
Y. The correlation of the old and new scales is .94. 


Titian TABLE 2 
ES ORRELATIONS* OF SCORES ON THE 
N, PMA, Axp Draw-a-Man TESTS, 
KINDERGARTEN CHILDREN 


V|[P|Q|Mo| S 


Raven 


Primary Mental 
Y Abilities 

erbal Meaning — |.32 

“reeptual Speed |.23|.64 


Mor titative .36|. 70].60) 
Space .22|.34|.42|.31 
D-a-M .34|.57|.60|.59|.60 
.22|.50|.44|.54|.40|.51 


. 
T "WES 
7 .25 significant at .01 level. 


^V 
al 3 
ues reported here are in terms of a restandardization and extension of the Goodenough test to be published 


REFERENCES 


Burke, H. R. Raven's Progressive Mat- 
rices: A review and critical evaluation. 
J. genet. Psychol., 1958, 93, 199-228. 

Minnesota scale for palernal occupations. 
Minneapolis: Univer. of Minnesota, 
Inst. of Child Developm. & Welfare 
(undated). 

Raven, J. C. Guide to using progressive 
matrices (1947). New York: Psycholog- 
ical Corp., 1947. 

THURSTONE, TurnMA G., & THURSTONE, 
L. L. Examiner Manual for the SRA 
Primary Mental Abilities (ages 5 to 1)- 
Chicago: Science Research Associates, 
1953. 


Received March 18, 1959. 


JOURNAL oF EDUCATIONAL PsYcHOLOGY 
Vol. 50, No. 5, 1959 


INTERPRETATION OF RELIABILITY 
AND VALIDITY COEFFICIENTS: 


REMARKS ON A PAPER BY LORD 
LEE J. CRONBACH AND GOLDINE C. GLESER 


University of Illinois! 


It was formerly held that only test scores 
with high reliability and validity were 
practically useful. Taylor and Russell 
(1939) were the first of many writers to 
modify this viewpoint by pointing out con- 
ditions under which a test can make a sub- 
stantial contribution even though its valid- 
ity or reliability is low. We have reviewed 
many of these papers elsewhere (Cronbach 
and Gleser, 1957). 

Lord (1958) has recently contributed a 
valuable paper on the usefulness of un- 
reliable difference scores. Two comments 
are to be made about this paper. a) Lord's 
method of analysis is general and applies 
to all reliability and validity coefficients; 
his paper therefore has implieations far 
beyond the interpretation of difference 
Scores. b) Modifying Lord's evaluation 
procedure in one particular leads to an 
important change in the conclusions. For 
many decision makers Lord's formulation 
is less suitable than the alternative anal- 
ysis, and his interpretation regarding the 
value of tests is insufficiently conservative. 

Most statements deseribing the useful- 
ness of tests as judged from their reliability 
or validity coefficients assume that a de- 
cision is made about every person tested. 
The Cooperative Test Division (1955) of 
ETS, in making recommendations to inter- 
preters of certain aptitude and achievement 
batteries, adopts the contrary position that 
decisions might better be made only about 

persons for whom the test provides de- 
pendable information. Bloom (1942) noted 
that even an unreliable test permits one to 


1'This study was aided by USPHS Grant 
Mp. tha comments of F. M. Lord on a 
draft of this paper are gratefully acknowl- 


edged. 


divide a group into a few broad categories 
with considerable confidence. In appa 
this concept to differences between SU 
tests of the SCAT and STEP batteries, bi 
CTD suggests that, where there is & larg? 
difference, a test permits an accurate 1 
ference that one true score is higher M 
another, even though the difference "€ 
has quite modest reliability. For "D 
with small observed differences, on (be 
other hand, the CTD suggests that Pes 
best course of action is to make no d fol- 
ential interpretation. Specifically, the oe 
lowing rule is proposed: If a difference RO E 
is larger in absolute value than k, interP™ 
it as a true difference; if it is less than di 
act as if there is no difference, at least UP 2 
further information about the person z 
taken into account. Considering the Y t 
Q scores for SCAT, for example, this P of 
egy calls for assigning the person to 0' di 
three groups: V > Q, V < Q, and no 
ference established. 


SrRATEGY Wira Fixen a Risk 


The value of k may be determined 
many ways. The CTD proposal m? 
proportional to the standard error of vol 
urement of the difference score; spec! d 
k is set equal to 4/2 S.E.4 . A Le ap? 
Mendenhall (1959) adopts a simil 
proach, setting & equal to 1.96 ne the 
shall refer to a strategy which make ed 
cutting score a multiple of S.E. aS * | pe 
a” strategy, for reasons whic 
made clear shortly. rule 

The most a de virtue of the CTD 
is the convenience with which it al sd 
applied. It is recommended for Pi E 
teries as SCAT and STEP that i as? 
be plotted on the profile sheet P 


jn 
k 


230 


RELIABILITY AND VALIDITY COEFFICIENTS 


Point, but as a band extending 1 S.E. 
above and below the observed score. A 
difference larger than 4/2 S.E.a will be 
Present only when the bands plotted do not 
Overlap. The counselor is instructed to 
interpret only differences where the pupil's 

"Score and Q-score bands show no over- 
lap (Le., if V is plotted as a band 40—50 
and Q as a band 48—58, no interpretation 
1$ made), 

Lord's paper is devoted to an evaluation 
of th s Strategy. He takes into account two 
Consequences of the rule, as applied to a 
4 erence score with a specified reliability: 

© Proportion of persons about whom dif- 
e tial interpretations are made (p), and 
ren Tage risk (q,) of making a differential 

Pretation when the true difference is in 
o Opposite direction. For example, when 
verted :42, and difference Scores are con- 
tion 8 to a scale with unit standard devia- 
of th Bom = 76 and k = 1.07. Then 28% 
Bhan à Subjects : have differences greater 
obser 07, and in 90% of those cases the 
E nis difference is in the same direction 
„Æ true difference. Hence, the average 
tion s an incorrect differential interpreta- 
ts 18 10% when the CTD rule is applied 

this test. 

^ 8 argument (like Bloom’s) sets aside 
Sa of Bennett and Doppelt 
abilit that the minimum acceptable reli- 

"ud for a difference score is about .75, 
tale] em was based on Kelley's (1923) 
i i. 10n of “the proportion of differences 
does E. of chance." Since this proportion 
top, he relate in any direct way to the 
Chee f of decisions based on the differ- 
it ig m as a basis for evaluating a test 
Clear hae inferior to Lord’s which has a 

ation to the utility of decisions. 
Oncludes that the CTD rule is an 
= © one, as the average risk is low 
ag jw the score has a reliability as low 
90, 4p ndeed, he points out that if ra > 
thag 4° Average risk is extremely low, so 
Coy E Counselor ignores differences which 
that a TY safely be interpreted. He implies 

etter strategy would be to adjust k 


eg, 8 
e 
e tab! 


231 


so as to maintain a fixed average risk, no 
matter what the score reliability. This may 
be referred to as a “fixed q” strategy, and 
the difference between fixed o and fixed 
qe strategies may be explained with refer- 
ence to Fig. 1. 

In this sketch, x is any score which is to 
be interpreted, and y is the score with 
respect to which persons would ideally be 
classified (criterion score, true score, or 
true difference score). Persons are to be 
identified who may confidently be classified 
as having y > y’ ory < y'. In the problem 
of identifying nonzero differences, y = 0. 
Persons for whom z > k are classified as 
having y > y’. Under the fixed æ strategy, 
kis placed on the z scale at a distance from 
zero determined by S.E.,. The standard 
error is the standard deviation of any hori- 
zontal array. The line z = k cuts off a 
certain proportion of persons in the array 
where y = 0, i.e., where the null hypothesis 
is true. We may refer to this proportion as 
a/2, recognizing that there are an equal 
number of cases where y = 0 and z < —k. 
Then a is the risk of incorrectly making a 
differential interpretation when the null 
hypothesis holds. Setting k equal to a fixed 
multiple of S.E. has the effect of holding 
a constant as rz, varies. Specifically, when 
k = 4/2 S.E., a is fixed at .16. 

The average risk qe, which Lord uses to 
evaluate the fixed æ strategy, takes into 


True or criterion score y 


Observed score Xx 


Fic. 1. SKETCH TO ILLUSTRATE œ RISK 
AND AVERAGE RISK qc 


232 


account all cases for whom a decision is 
reached, i.e., all cases where z > kors < 
—k. Those in areas B and C of Fig. 1 are 
erroneously interpreted. The proportion of 
decisions p reached is the sum of the vol- 
umes under the normal bivariate distribu- 
tion, pa + pn + pc + po. The proportion 
of correct decisions p. = (pa + pp)/p and 
the average risk qe = (pa + pc)/p. 

Lord suggests that the value of k might 
be adjusted, as rz, changes, to keep qe 
constant. This would require that k be a 
smaller multiple of S.E. as r increases. 
"Though Lord appears to regard this fixed 
q: strategy as superior to the fixed a 
strategy, he does not discuss it in detail, 
and we shall give it no further direct atten- 
tion. 

Our paper differs from Lord's in placing 
emphasis upon the maximum risk of erro- 
neous interpretation, rather than upon the 
average risk. Tt is obvious that the risk of 
a wrong decision is greater for the person 
Whose observed score is near the cutting 
point k than for the person with an extreme 
score. Looking at only the average risk, as 
Lord does, one may conclude that a pro- 
cedure is conservative even when appreci- 
able risks are taken in making decisions 
about persons near the borderline, The 
CTD rule proposes to interpret differences 
which, considered individually, are quite 
likely to be due to chance. Specifically, in 
the example considered above where Taa = 
.42, Lord reports an average error rate of 
10%, but we find that for persons with 
differences near k the expected error rate is 
18%. Some users who would be quite pre- 
pared to accept 1 erroneous interpretation 
in 10 would not consider an error rate close 
to 1 in 5 as adequately conservative, 

Thoug average risk is sometimes an 

appropriate loss function to use in evalu- 
ating a strategy, maximum risk is more 
appropriate in other situations. Arbous and 
Sichel (1952) and Arbous (1952) have com- 
pared the two in discussing industrial 
selection. They point out that a test of low 


LEE J. CRONBACH AND GOLDINE C. GLESER 


validity may profitably be used as the first 
stage in a sequential procedure for selecting 
employees, where unpromising applicants 
are ruled out and the remainder are given 
a further test. The benefit of this procedure 
to the institution (employer) may properly 
be judged in terms of the average ene 
of the men finally selected and the cost o 
testing per man hired. Such emphasis 0? 
average risk is not, however, appropriate 
from the individual's viewpoint. ARE 
and Sichel protect the interests of the * 
dividual by fixing a cutting score z' on cn 
pretest such that the maximum risk ¢ © 1 
false decision (rejection of a man who ie p 
pass the second test) is .001 or some othe! 
suitable value. Scores near the entire 
point z' can then be interpreted at & P 
determined level of confidence, and m 
extreme scores can be interpreted Be a 
even greater confidence. This is pre an 
to a strategy yielding a specified peng 
risk whenever individual rather than nce 
stitutional decisions are being made, e i 
the risk for any single individual is lim! 1 
As discussed by us elsewhere (1957, p- 9 X 
an institutional decision is one of a se! "pé 
decisions all of which contribute to, vid- 
benefit of the same institution. An indi the 
ual decision is one intended to serve re 
interests of an individual; it recurs, si i 
or never, and consequently the indiv! " 
cannot average his risks over many ani 
sions. Decisions reached in counseling 
guidance are individual decisions. 
Introducing the concept of max! 
risk raises two questions: How great a 
maximum risk under the fixed o Str% con" 
and what light does this shed on the: at 
clusions of Lord and Mendenhall? "5... 
procedure would guarantee that tha alu 
imum risk does not exceed a specifie yim” 
and how satisfactory is such 2 ' me rra 
$” strategy? We begin with the 
uestion. 
i For convenience, we shall gue 0. 
units such that z = 0, s- = 1, ae dat^ 
The scale for y will be given bY 


aximu? 
s e 


—— o ——— 


RELIABILITY AND VALIDITY COEFFICIENTS 


if y is a criterion score, but if y is a true 
Scorey = z = O and s, = s /r.- — Ar. 
Tipus 1 shows the seatter diagram relating 
og y, however these may be defined. 
Rs € d any vertical array (x fixed) 
qe ally distributed with 
"The "ual Tevet and Sys. = Sy WL Tey: 
i A =y cuts this distribution at a 
gl hose location, expressed as a nor- 
eviate within the distribution, is m. 
Y the usual transformation, 


c Y= GF revsyt) ul 
Sy vA — rhy 


5 he - proportion of the cases falling 
considerati elow m. The strategy under 
Shown in ^» will employ two cutoffs, as 
be classi ig. 2: if z <2’, the person will 
i as having y < y'; ifr» 
which neg > y'. (Introducing 2’ and z^, 
ormulates not be symmetric about zero, 
ian Ee. problem more generally 
ings - e use of +% as cutting scores.) 
Misclassif are concerned with the risk of 
is taken ae the proportion above m 
ede : : he risk $ when xz < 2’, and the 
Mong t below m is used when x > 2”. 
ti dite Nose arrays where z < 2’, the 
am Sreatest when 2 = 2’; similarly, 
ebur arrays where x > z^", the risk 
m Fra when x = x”. Call these maxi- 
Sociate T 9' and $^, respectively; the 
m values of m may be referred to 
or nd m, 
m rid Specified y’, z’ and x”, the maxi- 
t ing j 8 may be determined by substi- 
the tal : [1] to obtain m’ and m”, and using 
IW, ss normal distribution to find the 
below m” p Ms (call this $^) and the area 
9f differ (6^). In the CTD interpretation 
"ifie Pi Scores, persons are to be clas- 
g Toce uve or below y' — 0. The fixed 
ting t Ure sets 2! = —-V20 = raz): 
vt tin at res = 72, and g = 0,and sub- 
SN zd 1], we find that m’ = ~/2rz . 
Values ap = —V/2r... Over the range 
9! Tw , the CTD proposal fixing 


m 


Prop 


233 


a at .16 and y’ at 0 leads to the following 
consequences: 


Proportion of 


xia "nb digg 
Taa are made (p) (ac) (' = 9") 
.95 78 <.01 .08 
-80 .53 .02 .10 
-60 .97 .06 .14 
.40 .27 ll 19 
.20 .21 19 .26 
.00 .16 .50 -50 


(The first three columns are simil 
Lord’s.) aid 


When ra — 1, m > V2 and m" > —+/3, 
so that under the CTD procedure, the 
maximum risk is never less than .078. The 
maximum risk is of course greater than 
the average risk (save where rua = 0), and 
becomes several times as large as qe when 
Taa is large. 

The acceptable risk depends on the type 
of decision being made. In individual deci- 
sions (particularly counseling), it is gen- 
erally desirable to be conservative, seeking 
additional information rather than accept- 
ing a hazardous conclusion. When a ter- 
minal decision is under consideration, it 
appears reasonable to set the maximum 
risk at .10 or .05. An even lower level 
might be desired for an important decision 
that could not be reversed should it prove 
to be wrong in the light of later experience. 
On the other hand, some counseling inter- 
pretations are easily and cheaply reversed 
as more information comes to light (e.g., 
performance in verbal and mathematical 
courses may reverse an impression of differ- 
ence given by test scores). A. risk of .20 
seems none too high for a tentative deci- 
sion where reversal costs little. 

The problems considered to this point 
are also pertinent to Mendenhall's (1959) 
paper. His discussion is in many ways like 
that of Lord, representing an attempt by 
a test-publishing organization to state how 
useful is differential information from one 
of its tests. Following conventional statis- 
tical logie, Mendenhall calls an observed 


234 


difference "significant at the .05 level" if 
it exceeds 1.96 S.E.,. He then judges the 
utility of the test by calculating the propor- 
tion of cases expected to have "significant" 
differences. When rag, = .81, for example, 
he finds that 39% of the persons have sig- 
nificant difference scores. He thus implies 
that a cutoff at +1.96 S.E. would permit 
decisions to be made about 39% of the 
cases, with 5% risk of misclassification. 
This is a fixed a strategy, differing from 
the CTD proposal in that 1.96 replaces 
1.41 as a multiplier, so that o is .05 in- 
stead of .16. Mendenhall calculates, as does 
Lord, the proportion p of cases for whom 
a decision is reached. He fails, however, to 
consider that the o risk is not an indica- 
tion of the dependability of the decisions 
made, a matter with which the decision 
maker is normally concerned. The table 
above shows that both the average risk p, 
and the maximum risk $ may exceed o. 
Moreover, we find that when Ta = .81, we 
can make decisions about only 26% of the 
cases with a guaranteed ¢ risk no greater 
than .05 (cf. Mendenhall's 39% above). 
The risk of interpreting a null difference 
(a) is not the same as the risk 9 of mis- 
interpreting (reversing) a difference. The 
o risk answers the question: Given a per- 
son for whom the true difference is Zero, 
how likely are we to interpret an observed 
difference for him? The ¢ risk answers the 


True or criterion score y 


Observed score x 


F . SKETCH TO ILLUSTRATE MAXIMUM 
Fra. 2 à Ru 


LEE J. CRONBACH AND GOLDINE C. GLESER 


question: Given a person with a borderline 
observed difference, how likely are we to 
be incorrect in interpreting that difference? 
Mendenhall’s analysis, though technically 
accurate, by implication gives an unduly 
favorable impression of the value of the 
difference scores in question. 


Srrarecy Wira Frxep Maximum ¢ Risk 


We turn now to the consideration of 
strategy designed to fix the maximun 
9" and $" at some stated level dm. 5.7 
desired to use z scores to identify m 
uals who are above and below some et 
y’, with a maximum risk ¢,, of an er $ 
identification. Two cutting scores ont "i 
scale are determined: a lower score $ Kk 
that when z = z', Ply » y) = ^ gj 
an upper score z^ such that when d 
z", Ply < y) = om (see Fig. 2)- T 
the given m, m is determined irom ue 
normal table. For the lower cutting 5^. 
x’, dm is the proportion of cases 1” 
upper tail, hence, the sign of m 38 be 
tive. For the upper cutting score 7 »*7. i 
the proportion of cases in the lore vie 
hence the sign of m is negative. °° 
[1] for z gives 


osi- 


y—g—-m,Vi-ts 0 
QE-U)-mmvI-T» 
SyTzy 


T 


ces 

Tables 1 and 2 indicate the consequ. at 
of applying such a maximum ¢ pe 
various levels of reliability or valid? ge a8 

In Table 1, persons are to be n able 
above or below average. Though E de- 
covers the problem Lord considere rene’ 
tecting positive and negative C m ret 
within a profile, it also applies tO m rele 
ing any other decision about stan li ilit 
tive to the group mean. For high T trate 
(or validity) the results under e d bY 
are very similar to the results E : 
Lord for the fixed o strategy» P ond Do 
to both number of decisions made veh e 
age risk. For low reliabilities, h et oor 
find that the number of decisions wer th 
made with confidence is muc 


RELIABILITY AND VALIDITY COEFFICIENTS 285 


E. TABLE 1 
LITY OF TEST ror CLASSIFYING PERSONS AS ABOVE AND BELOW AVERAGE WITH 
MaxriwUuM Risk ø = .10 


Percentage of Persons 
for Whom) Decision 
is Reached Proportion | Average 
Validity | Reliability] — X" Correct | "Risks i 
Any Correct a Risk 
Decision Decision 
— | b Pad Pc be qe 
HE mmu I" emm | eas | ma tan 
a 1.00 .00 100.0 100.0 1.000 .000 1.00 
ee -90 42 67.4 66.4 .985 .015 18 
350 .81 .62 53.5 52.2 .976 .024 15 
io .64 .96 33.7 32.4 .961 .039 ll 
60 .49 1.30 19.4 18.4 .948 .052 .07 
^50 Es ed 8.9 8.4 .936 -064 .03 
3 . .22 2.6 2.5 .932 .068 E 
-30 .09 4.07 sid essa à 
TABLE 2 


Umrry or Test FOR CLASSIFYING PERSONS AS ABOVE OR BELOW y = Sy 
with Maximum Risk .10 


Identification of Superior Cases [Identification of Nonsuperior Cases! Total Decisions 

Validity, Decisi p 

C ANN UM mm | 1 

pper 'OpOr- | Lo rODOI- | Numbi 'ropor- 
Guto Any | Correct Core Gatot Any {Correct Conect Reached Correct Cu Tet 
Deci- rrect i ik 

E) | sion | Decision Deck Dect 

r 
? " 

E wo a [oda | v | PSF] ao | fa p uk 
» 1.00 | 15.9 | 15.9 | 1.00 | 1.00 84.1 | 84.1|1.00 | 100. | 100. | 1.00 
.90 l.47| 7.1] 6.9 .972 -63| 73.6 | 73.0 | .992 | 80.7 | 79.9 | .990 
.80 1.73 | 4.2| 4.0 .952 “49| 68.8 | 67.9] .987 | 73.0 | 71.9 | .985 
-70 2.21] 1.4] 1.3 .028 “99| 61.4 | 60.0 | .977 | 62.8 | 61.3 | .976 
60 2.73 3 .25 | -833 "13| 55.2 | 53.3 | .965 | 55.5 | 53.6 | -965 
uo |887 e | _“o3| 48:8 | 46.8 | -959 | 48.8 | 46.8 | -959 
-40 —.99| 41.3 | 39.1 | .947 | 41.3] 39.1 .947 
-30 —.43| 33.4 | 31.4 | .940 | 33.4 | 31.4 .940 
+20 —.13| 23.3 | 21.5 | .923 | 23.3 | 21.5 .923 
10 —1.25| 10.6 | 9.6 | -906 | 10.6 9.6 | .906 

—2.74 .3 3 


nU 
the 5 Teport Suggests. Even when rea = 0, 
x strategy allows decisions regard- 
vo E br the subjects, but our strategy 
be other E The maximum ¢ strategy, on 
go ing and, runs a greater risk of over- 
Xed g us true difference than does the 
Tategy. 


It is of interest that average risk ge re- 
mains much more constant over the pos- 
sible range of reliabilities than it does under 
the fixed a strategy. The maximum ¢ 
strategy to some degree overcomes the 
difficulty which led Lord to suggest a fixed 
q. strategy in place of the fixed æ proce- 


| 


236 


dure. Fixed qe and maximum ¢ strategies 
are by no means identical, however. While 
an average risk qe of .01—.07 corresponds 
to a maximum risk ¢ of .10, further calcu- 
lations indicate that q must be set near 
.002 to guarantee that ¢ is no larger than 
ih 2 deals with the situation where 
it is desired to discriminate persons with 
y > --1s, from those below that point. 
The upper cutoff is used in identifying 
persons with a marked superiority. The 
lower cutoff is used to identify persons for 
whom y is thought to be less than y’. (The 
same values, with a change in sign of z^ 
and z^, apply when the test is used to 
identify persons with a marked weakness.) 

Our tables provide a corrective to the 

optimism of Lord's table. According to 
Table 1, a score with validity .80 or relia- 
bility .64 permits classifying one-third of 
the subjects as above or below average on 
the criterion (or true score) with a maxi- 
mum risk of .10. Likewise, a difference 
score of reliability .64 permits us to report 
one-third of the subjects as having definite 
positive or negative differences with, at 
most, 1 chance in 10 of being incorrect. If 
we set the tolerable risk ¢,, at 1 in 20, 
reliability must be about .85 to permit an 
equal number of decisions. According to 
Table 2, the test of low reliability or valid- 
ity permits à somewhat greater number of 
decisions as y’ moves away from the mean, 
but this gain is mostly in singling out non- 
deviates, ie., persons for whom y is not 
more extreme than y'. A test with reliabil- 
ity .64 identifies, at the desired level of 
confidence, less than one-tenth of the supe- 
rior persons for whom y > +1 s. 

It somewhat oversimplifies the problem 
to treat all errors of classification as equally 
serious. In a specific situation, the most 
satisfactory analysis of the usefulness of a 
test and decision-making strategy would 
usually be obtained by specifying for each 
y the exact benefit or loss from each pos- 

ible decision (Cronbach & Gleser, 1957, 
sil 44-40). A much simpler formulation, 
ak will often be appropriate. Sup- 


LEE J. CRONBACH AND GOLDINE C. GLESER 


pose that in evaluating certain difference 
scores for counseling purposes, it is recog- 
nized that large differences are much more 
important to detect than small ones. The 
level y' may be specified so as to distinguish 
between true differences regarded as impor- 
tant and those regarded as trivial. Using 
Equation [2], cutting scores may be deter- 
mined so as to permit the judgment y 21 
or y < —y’ with an acceptable risk 0 
error. Two symmetric cutting scores e 
determined: a lower score z^ such that n 
z <x’, Ply > —y) < pm, and an Mt 
score, x”, such that forz > z^, P(y <¥ ) ae 
$m. Interpretations or decisions are mac 
for persons for whom x < 2’ and ms 
If æ’ <a < x”, no decision is made. ‘ai 
operation of this strategy may be mat 
by reference to Table 2, if we suppose pe 
true differences less than 1 SD in abao 
value are considered negligible. Then 10) 
number of persons confidently ($» < one 
identified as having a large difference in *. 
direction or the other is obtained bY 23 
bling the entries in the pa + P5 colu ih 
and the average accuracy is given bj i 
(without doubling). Obviously, & ‘offer 
identify very few persons as eek ty 8 
ences greater than 1 SD if its relia bi d 
below .80 (index of reliability beloW * 


DISCUSSION 


We have identified three risks 
may be taken into account in fixing 
gies for test interpretation and for ¢ k 
ing the usefulness of a test interDTÓ^^ pat 
a particular strategy. It is assume 
the persons are to be divided ma 
classes: those whose true scores | ved t° 
differences between scores) are e core 
be greater than a specified. criterio 
y'; those believed to have true Farther 
than y’; and persons for whom 2e! 
terpretation may safely be made- reting ^ 

The risk o is the risk of inter”, wpe? 
score as indicating y > y' or Y =< is w 
y actually equals y’. The risk T de: 7 
average risk, over all decisions c E 
concluding that y > y’ when Y ee 
less than y’, and vice versa- he 


which 
strate 
valusat 


RELIABILITY AND VALIDITY COEFFICIENTS 


9" is this same risk of misinterpretation 
for persons at the score where the risk of 
a enm is greatest. The CTD 
fare oo suggest a strategy which 
X ea risk ata predetermined value. 
ixing the o risk is a logically defensible 
ud ia establishing a strategy. It is not 
ity SS ate, however, to describe the util- 
"d lecisions actually made in terms of 
Soden risk, as Mendenhall does. The an- 
de Heh question is contained in the ge 
ind Ciel The former is more important 
<i itutional decisions (e.g., selection, 
aga and the latter in individual de- 
Eran edo counseling). A strategy de- 
s o fix either qe or Gm , depending on 
are of decision, is logically to be pre- 
a ae a fixed o strategy. The fixed 
Rove as Lord shows, is unduly con- 
ius ^ when applied to highly reliable 
» cede 3 "m it results in a high rate 
and i of very low reliability 
fixis une with Lord and Bloom that 
reliabilitas, arbitrary level of validity or of 
Suitabili, which makes a score useful. The 
Coefficient, of a test depends upon these 
importan S, but it also depends upon the 
i "s of the decisions to be made and 
ted icd by which scores are to be con- 
^" o interpretations. Where a test is 
b tis example, to identify those persons 
A tess Clearly above or below the mean, 
= 10, qe = .05, a = .07) 
Or may of those tested. This may 
Yield, q not be a profitable information 
testi * €pending on the situation, cost of 
lect sı ete. Where the primary aim is to 
above oe individuals (more than 1 SD 
Teach 86 mean), the test reliability must 
the Superi before as many as one-third of 
Pss dde individuals are identified with 
eni de 035, On the other hand, a 
reth ity .36 is capable of identify- 
Fein an half of those who are defi- 
entia] rata as the first stage in à 
ns creening process, this test can 
i TSons who need be given no fur- 
ideration, 


Sj 
in x 
dite 


Con, 


237 


The test designer and selector of tests 
must abandon his quest for a rule of 
thumb, and instead interpret Tables 1 and 
2 (and similar tables for other decision 
problems and risk levels) in the light of his 
particular situation. For the typical coun- 
seling decision, it is our opinion that the 
maximum individual risk $, is the most 
important consideration in determining the 
interpretability of scores. From this point 
of view, the difference scores for certain 
published batteries discussed by Lord and 
Mendenhall are somewhat less useful than 
their papers imply. 

REFERENCES 

AnBovs, A. G. Tables for aptitude testers. 
Goldfields: Nat. Inst. for Person. Res., 
South African Council for Scient. and 
Industr. Res., 1952. 

Amnous, A. G., & Scaer, H. S. On the 
economies of a pre-sereening technique 
for aptitude test batteries. Psycho- 
metrika, 1952, 17, 331-346. 

Bennett, G. K., & Dorper, J. E. The 
evaluation of pairs of tests for guidance 
use. Educ. psychol. Measmt., 1948, 8, 
319-325. 

Broom, B. S. Test reliability for what? J. 
educ. Psychol., 1942, 43, 517-526. 

Cronzacu, L. J., & GLESER, GOLDINE C. 
Psychological tests and personnel deci- 
sions. Urbana: Univer. Illinois Press, 
1957. EDUCATIONAL TEsTING SERVICE. 
Examiner’s Manual, Cooperative School 
and College Ability Tests. Princeton: 
Author, Cooperative Test Div., 1955. 
Pp. 30-35. 

KELLEY, T. L. A new method for determin- 
ing the significance of differences in in- 
telligence and achievement scores. J: 
educ. Psychol., 1923, 14, 321-333. 

Lorp, F. M. The utilization of unreliable 
difference scores. J. educ. Psychol., 
1958, 49, 150-152. 

MENDENHALL, G. V. Analysis of differences 
between language and non-language I Q's 
of the California test of mental maturity. 
Hollywood: California Test Bureau, 
1959. (Mimeo.). 


Pearson, K. Tables for statisticians and 
biometricians. Il. London: Cambridge 
Univer. Press, 1924. 

Taxron, H. C., & RUSSELL, J.T. The rela- 
tionship of validity coefficients to the 
practical effectiveness of tests in selec- 
tion. J. appl. Psychol., 1939, 23, 565-578. 


Received April 20, 1959. 


Important features of 


THOMPSON, GARDNER, 
and DI VESTA'S 


EDUCATIONAL 
PSYCHOLOGY 


Functionally correlates generalizations and theories with class- 
room procedures. 


E 


Introduces the student to the principles of educational psychology 
through an overview of what is now known about growth, learning, 
and individual adjustment. 


Provides a practical approach to pupil evaluation that considers 
many methods of formal testing as important supplements to the 
teacher's everyday observations. 


Explains how many different kinds of information can be inte- 
grated into a case study that will reveal profitable remedial 
measures. 


Emphasizes the major barriers which block problem-solving and 
hinder creative thinking. 


Makes clear the ways and means by which teachers may influence 
the attitudes and value-judgments of their pupils. 


Explores, through many true anecdotes, the details of self-concept 
and its effect on pupil adjustments. 


Shows how individual striving and adjustment within the frame- 
work of social relations can be influenced by the teacher to pro- 
mote pupil accomplishment. 


Correlates social relationships within the classroom with the 
principles of group dynamics. 


A STUDENT'S WORKBOOK containing review exercises, problems, pror 
ects, definitions, bibliographies, etc., following the contents, organizat o7 
and underlying philosophy of the text, is also available. 


APPLETON - CENTURY - CROFTS, Lis 
35 WEST 32nd STREET NEW YORK 1, NEW YOR 


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 


Volume 50 


December 1959 


Number 6 


SOME BEHAVIORAL CORRELATES OF TEACHER 
EFFECTIVENESS! 


DONALD M. MEDLEY ann HAROLD E. MITZEL 
Division of Teacher Education, Municipal Colleges of New York City 


Tear, starch Office of the Division of 
a ducation of the City of New 
pee Ho0E, beginning in 1953, a 
crt inal study of graduates of the 
murio; edueation program in the four 
is ar colleges—City, Hunter, Brook- 
the Pa Queens. This study has followed 
mittes eral plan proposed by the Com- 
Dess acm Criteria of Teacher Effective- 
PER, i by the American Educational 
dolt c ssociation in 1950 (Barr, Bech- 
=a E "e Gage, Orleans, Remmers, & 
ARE i p2)- The Committee suggested 
Wat c the definition and measure- 
and the imensions of teacher behavior; 
these rh iy of relationships between 
9f this F sets of dimensions. The purpose 
e ead was to carry out a phase of 
the c Step listed above by examining 
each lonships between some measures 

E une effectiveness and some teacher 
an Suede obtained in the course 
Coordin aca of the graduates of a 
in the ated teacher preparation program 
City, municipal colleges of New York 


SAMPLE 


In 

t 

Ma ated of 1954, a follow-up was 
Student t Ocate those graduates from the 
Were 5 teaching class of 1953-1954 who 


o; : 
T fered teaching positions in New 

TA 
Planche eG acknowledgement is due to 
da lable. ogan and Yo Yee Chacko for 
a of ee in the analysis of the 


York City public elementary schools. A 
search was then made to find all schools 
in which two or more of these teachers 
were assigned to teach Grades 3, 4, 5, or 
6. Of the 75 teachers who met these cri- 
teria, 19 were eliminated because the 
school was inaccessible, the principal did 
not wish to cooperate, or one or more of 
the teachers failed to accept the position 
or did not wish to cooperate. Of the 56 
teachers who began the study seven 
dropped out during the year. (When one 
teacher in a school with two participating 
teachers resigned, the other teacher in that 
school was perforce also dropped.) Thus 
the sample on which the study was based 
comprised 49 teachers in all. Forty-six 
of the 49 teachers were women; 23 were 
teaching Grade 3, 13 Grade 4, 9 Grade 5, 
and 4 Grade 6. The teachers were scattered 
among 19 schools in the boroughs of Man- 
hattan, Brooklyn, the Bronx, and Queens. 


VARIABLES 


The study employed five variables pur- 
porting to measure one or another aspect 
of teacher effectiveness, three measuring 
dimensions of classroom behavior, and & 
number of variables designed to control 


extraneous variation. 


Measures of Teacher Effectiveness 


Adjusted reading growth. To measure 


growth in reading ability, four subtests of 
the California Reading Test (Elementary) 
— Word Form, Word Recognition, Mean- 


239 


240 


ing of Similarities, and Interpretation of 
Meanings were administered to each class 
at the beginning of the year, and an equiv- 
alent form of the reading test was adminis- 
tered at the end of the year. The final 
reading test raw score of each child was 
adjusted by covariance analysis for his 
initial reading test score and for his in- 
telligence raw score based on an admin- 
istration of the Non-Language Section of 
the California Test of Mental Maturity. 
The within-classes regression coefficient 
was used as the best estimator of the 
mental ability effect to be removed. The 
teacher whose class showed the highest 
mean adjusted final raw score—that is, 
the greatest average improvement in read- 
ing—was considered most effective in stim- 
ulating pupils to learn to read. The 
reliability of these adjusted reading growth 
scores was estimated to be .84 (Mitzel & 
Medley, 1957). 


Growth in group problem solving skill. 
To measure growth in group problem 
solving skill the Russell Sage Social Re- 
lations Test (Damrin, 1954) was used. 
This test requires the class as a group 
(without the teacher) to devise and exe- 
cute a plan for constructing a replica of 
a model shown them, using special inter- 
locking blocks provided by the examiner. 
It was administered to each class at the 
beginning and end of the school year. 
The protocols were scored in the pre- 
scribed manner by D. Damrin. This scor- 
ing procedure does not yield a single 
quantitative measure of group problem 
solving skill but, rather, 14 discrete rat- 
ings which are meant to be used to locate 
each class in a two-dimensional plane. The 
14 ratings obtained on the second admin- 
istration of the test were combined using 
weights obtained by the method of re- 
ciprocal averages (Horst, 1935; Mosier, 
1946). The composite scores lay along the 
principal common factor of the test. 
The weights obtained in the second ad- 
ministration were used to combine the 14 


DONALD M. MEDLEY AND HAROLD E. MITZEL 


scores from the first administration into à 
single composite. Scores on the second ad- 
ministration were predicted from scores on 
the first administration, and the deviation 
of the actual second score of a class from 
its predicted score was taken as a measure 
of the effectiveness of the teacher in 1M- 
proving the skill of the class in grouP 
problem solving. The reliability of these 
growth scores is not known, but scores 0D 
the first administration had an estimated 
reliability of .85 based on an internal con" 
sistency analysis. * x 

Pupil-teacher rapport. Ratings of pupil- 
teacher rapport were obtained directly 
from pupils’ reactions by administering the 
My Class Inventory (Medley & Klein 
1957) to each class at the end of the schoo 
year. The “halo” score derived from io 
inventory is based on pupils’ respons? 
to the following eight items: 


Do you ever feel like staying away from 
school? 

Do you like to be in this class? 

Do you have much fun in this class? 
Do you learn a lot in this class? 
Are you proud to be in this class? "Ts 
Do you always do your best in this ¢ a 
Do most of the pupils like the teacher 
Does the teacher help you enough? 


The reliability of the teacher's score K 
this scale was .89 in the group ° 
teachers. j the 
Teachers’ self-ratings. At the end 0 q to 
school year each teacher was aske 
judge how well she had played €*^ 
three roles a teacher must play. Í 
the teacher “is responsible for PT? ult iP 
learning experiences which will (E ental 
students’ acquisition of fonon E 
knowledge." In Role II, the teach 
responsible for providing childr?? god 
learning experiences ... leading En self" 
citizenship, personal satisfaction, e 
understanding.” In Role III, the teach” 
is “a professional colleague of pnt qhe 
ers, supervisors, and administrato g Po 
teacher was asked to indicate e 
would stand in a typical sam P gv” 
teachers with respect to her € a 


M 


— ——— —— Sem a 


CORRELATES OF TEACHER EFFECTIVENESS 241 


in playing each of these roles. Only the 
self-rating on Role I (teaching fundamen- 
tals) „Was used in this analysis. 

A Principals' ratings. The principal or the 
Ssistant principal responsible for super- 
Vising each teacher—or both, when avail- 


. Able—was asked to evaluate each teacher 


E tw manner described above on each 
di same three roles. An analysis of 
differen: failed to reveal any significant 
pm between ratings of the same 
ee on different roles, so all ratings of 
ao eacher were pooled. This pooled 
effec dim regarded as a rating of over-all 
«ior a of the teacher by her super- 
coefficie supervisors, and had a reliability 
nt of .89. 


Dim, : 
€nstons of Classroom Behavior 


meom behavior was measured with 
1958) technique (Medley & Mitzel, 
teacher Ix observers each visited all 49 

ny "E twice at different times over a 
record ae period, and made an objective 
each vite ehaviors for half an hour on 
mensions - Scores on three orthogonal di- 
Bia of S were developed by factor analy- 

these records. » 
p climate (reliability .90) re- 
" lee amount of hostility observable 
om bc a high score indicates a 
armth Which external manifestations of 
Ostile n and friendliness are common and 
the Gee rare, Behaviors typical of 
is pj oom in which Emotional Climate 
gh are: 


ers 


Toi 


T 

Sie calls pupil “dear,” ete. 

Pupil o demonstrates affection for pupil 
emonstrates affection for teacher 


each ; $ 
* Sene S pupil-supportive statement 


S typical of the classroom in 
Imate was low include: 


Teacher 


Teac uses sarcasm 


h 
ic e makes reproving remark 
Pu i ien Towns, glares, etc. 
Pupil ae teacher’s questions 
upil sles or fights 
whispers 


Verbal emphasis (reliability .77) indi- 
cates the degree to which verbal activities 
predominate. Behaviors typical of a class 
with high Verbal Emphasis include: 


Pupil reads or studies at his seat 

Pupil writes or manipulates object at his 

sea 

Pupil (or teacher) uses textbook or work- 

book 

Pupil (or teacher) uses supplementary 

reading matter 

Pupil (or teacher) uses writing materials 
In addition, such a class was observed to 
be having a reading lesson relatively often 
and a social studies lesson infrequently. 

Social organization (reliability .83) has 
to do with the amount of social grouping 
and pupil autonomy in a class. A class 
scoring high was one in which it was rela- 
tively common to find the class broken up 
into two or more groups working inde- 
pendently, and in which the teacher talked 
relatively little. 


Control Measures 


It is well known, and often pointed out, 
that pupil learning is a function of many 
variables of which the skill of the teacher 
is only one. If the part played by the 
teacher is to be detected, it is essential 
that as many as possible of these other in- 
fluences be controlled. Most of these dif- 
ferences are apparent either as differences 
between schools and communities or as 
differences between classes within the same — 
school. 

Class differences within schools. As par- 
tial controls on the characteristics of the 
class taught by each teacher, the class 
mean scores on the Non-Language Sec- 
tion of the California Test of Mental Ma- 
turity and on the initial administration of 
the Russell Sage Social Relations Test 
were used. It was felt that information 
about the average mental maturity of a 
class would be valuable in studying rela- 
tionships between behavior and effective- 
ness. The individual slow student, for ex- 
have a very different school 


ample, would 
h a given teacher in a class 


experience wit 


242 


of bright pupils than he would have had 
with the same teacher in a class all as 
slow as he is. Therefore, in addition to the 
control on individual ability incorporated 
in the growth measure by the covariance 
adjustment, these controls on the average 
ability of the class seemed important. 

Similarly, the initial score of a class on 
the Russell Sage Social Relations Test was 
felt to contain information about the class- 
room environment which would be helpful 
in explaining the behavior of the class. A 
class able at the beginning of the year to 
discipline itself well enough to devise a 
feasible plan for constructing the block 
model and to carry it out without the 
teacher’s supervision presents a very dif- 
ferent problem to the teacher than is 
presented by a class which disintegrates 
whenever the teacher is not there to dis- 
cipline it. , 

A third control on differences between 
classes in the same school was obtained 
from the grade level. The grade number 
of the class—three, four, five, or six—was 
simply inserted in each multiple regres- 
sion equation to remove part at least of 
any differences among classes that might 
be related to grade level. 

Differences between schools and neigh- 
borhoods. The effects of the type of school 
and community in which a beginning 


TABLE 1 
WrrHIN-ScHOOLS ÍINTERCORRELATIONS 
AMONG Five MEASURES or 
TEACHER EFFECTIVENESS 


1 2 3 4 5 


1. Average Adjusted -F.063|—.002.--.133 |4-.405* 
Reading Growth 

2. Average Growth in 
Group Problem- 
Solving Skill 

3. Pupil-Teacher 
Rapport 

4. Supervisors" Rat- 


--.2504-.0607 |--.179 


H-.978*|—.054 

+.-123 
ings 

5. Teacher’s Self-Rat- 


ing (Teaching 
Fundamentals) 


* Significant, .05 level (29 df). 


DONALD M. MEDLEY AND HAROLD E. MITZEL 


teacher finds herself can hardly be 18- 
nored in any attempt to relate her suc- 
cess in teaching to her own and her pupils 
behavior or to other characteristics of the 
teacher. Such influences were controlled 
in this study by analysis of covariance Ap- 
plied in a manner similar to that described 
by Kendall (1948). All variances and co- 
variances were computed separately be- 
tween and within schools, and only corre 
lations estimated from covariation within 
schools were used in the study. Thus, since 
all comparisons were made between teach- 
ers and pupils within the same schools, ei 
effects of school and neighborhood dif- 
ferences were effectively removed.” 


ANALYSIS OF DATA 


The five variables intended to measur 
teacher effectiveness were intercorrelato 
with the results shown in Table 1. The rs 
sults suggest that two aspects of lia : 
ness are being measured by these five $ 5 
of scores. One set seems to be relate ^ 
the teacher's ability to teach reading ea 
the other to her ability to establish 8° gs 
rapport with pupils. Teacher self-ratie™ 
on the teaching of fundamentals. app? ,! 
to reflect the former ability; super 
ratings the latter. 

This apparent finding that th su 
kinds of ratings of effectiveness pet 
different aspects of teacher effective 


e two 


5 dat& 
suggested further analysis of the three 
Each rating was regressed on t P and 


direct measures of effects on PUP gulis 
on the three control variables; the TC pot 
after control variables which d! oved 
contribute to the prediction were rom 
are shown in Table 2. «ole col 
Supervisors’ ratings had a multip used 
relation of 44 with the five variable’ (gt 
The only appreciable beta weight i. self- 
with pupil-teacher rapport. Teac or tals 
ratings on the teaching of fundam 


, 


vari" 
g ver 
*Examination of between-sehoe oti 
tion indicated that considerable js tec 
of the findings was removed by 


nique. 


J! qM 


* 


M  ———— 


CORRELATES OF TEACHER EFFECTIVENESS 243 
E TABLE 2 
ELATIONSHIPS BETWEEN RATINGS OF TEACHER Err 
A ECTIVEN: 
OTHER MEASURES OF TEACHER EFFECTIVENESS Yu 
Beta Weights 
j M Effecti i 
OMEN Multi ple easures of Effectiveness Control Variables 
lation Average, Sram Pupil X 
justed | pezon - verage Initial Group 
Readi roblem teache: Mental 
Growth. Saving, Ripia Maturity Gad rA Nem 
Bys i 
By ar A | jr T —.08 | +.39 | +.19 | not used | —.19 
No I g F +.39 | +.17 | —.09 | +.03 | not used | not used 
K TABLE 3 
ELATIONSHIPS BETWEEN TEACHER EFFECTIVENESS AND CLASSROOM BEHAVIOR 
Beta Weights 
Criterion of Effectiveness MAII: Classroom Behavior Dimensions Control Variables 
lation 
Emotional| Verbal Social | Average Initial Group 
B +. | O l 
Climate | Emphasis Onion | Maturity prec solving Skill 
Aver, ; 
3 a .55** | +.20 | +.09 0 notused| -+.52 not used 
rowth 
TOwth ; 
i in Group .26 +.06 | —.09 | +.09 | +.24 | notused | —.26 
spore Solving Skill 
" E .49 +.32 | +.28 | +.09 |not used| —.24 not used 
Uperyj 
$ hun .56* | +.52 | —.01 | +.10 | +.20 | notused | —.32 
eacher’ å 
er's Self-Rating .48 +.10 0 —.44 | +.28 | not used | not used 


5 
«P & 10, df = 25. 
? «05, af = 26, 


Wit 

PM acie correlation of .45, ap- 
Cleary chiefly determined by pupil growth. 
Deets » these five measures tap two as- 
to teacher effectiveness—one related 


pu : 
Morale learning, the other to pupil 


elatio, p 
9nship of Effectiveness to Behavior 


n 
about der to answer the main question— 
ben, vior relationship between classroom 
Ultip) and teacher effectiveness—the 
mu ear en technique was again 
three 7*0. Six independent variables, the 
mensions of behavior and the three 


controls on class differences, were em- 
ployed, with only the control variables 
which actually functioned in a given equa- 
tion being retained. School differences were 
again controlled by the within and be- 
tween procedure. Each of the five meas- 
ures of effectiveness was used in turn as 
a dependent variable. The results appear 
in Table 3. 

Measured growth of pupils in reading 
ability based on raw score increments 
shows very little relationship to any of 
the three dimensions of behavior, but 
seems to depend only on the grade level 


244 


in question; apparently (when allowances 
for individual differences in ability are 
made) there is a tendency for growth in 
reading (as measured in this study) to 
increase from grade to grade over this 
range. These findings suggest that a gen- 
eralized maturational factor tends to out- 
weigh other factors in the assessment of 
reading growth in the elementary schools. 
We recommend investigators of this prob- 
lem restrict their studies to a single ele- 
mentary grade at a time in order to pro- 
vide a local control of this maturational 
factor. 

Growth of pupils in group problem 
solving skill, as measured by improvement 
in Russell Sage Social Relations Test 
Scores, seems not to be related to any 
appreciable degree either to recorded 
classroom behavior or to differences among 
classes within the same school. 

Pupil-teacher rapport seems most 
closely related to Emotional Climate, rap- 
port (naturally enough) being highest 
where Emotional Climate is warmest. 
There is a suggestion that rapport is likely 
to be better when the emphasis on verbal 
activities is above average, and a hint 
that it is lower at the upper elementary 
grade levels. 

Supervisory ratings are related appreci- 
ably only to Emotional Climate among the 
behavior dimensions. Apparently a super- 
visor thinks that the teacher whose class 
is friendly and orderly is an effective 
teacher. There is a small negative weight 
on initial social relations scores, 

Teachers who rate themselves as highly 
effective in teaching fundamentals tend to 
allow their pupils less opportunity to work 
in small, independent groups than those 
who rate themselves as less effective. Their 
classes seem likely to be slightly higher in 

average mental ability. 


Discussion 


Previous research in the measurement 
of teacher effectiveness has tended to in- 
dicate that supervisory ratings of teacher 


DONALD M. MEDLEY AND HAROLD E. MITZEL 


effectiveness and measures of how much 
pupils are learning from the teacher have 
little in common. Typical conclusions 
drawn from such studies are: 


Teacher rating scales...are only slightly 
related to the observed pupil growth (He 
fritsch, 1945). Paes 

...evaluations based on... supervisors 
ratings and those based on measures of pUP d 
growth and achievement were not sign! 
cantly correlated (Anderson, 1954). | wate 

..- Supervisory ratings here provide 
invalid [as measures of pupil gain] 
Duke, 1945). e 

...Supervisory ratings...seem to " upi 
liability and validity [as measures of P 

ain] (Jayne, 1945). 

" The ewe of pupil change apparently 
measures something different than 1945). 
measured by teacher ratings (Gotham, upil 

The three criteria ... [pupil gain, z per- 
evaluations, and a composite of five antel 
visory ratings] are not related to & m v adi 
degree than can be attributed to © 
(Lins, 1946). sa palate 

Whatever pupil gain measures in oasis 
to teaching ability it is not that emp 
in supervisory ratings (Jones, 1946). jlity are 

Employers' ratings of teaching abili P d 
not related to pupil gains in inform 
(Brookover, 1945). 


re- 


e con” 


; d bove are © 
The conclusions cited a jations 


sistent with the zero order corre ty 
between supervisors' ratings and E H 
measures of growth reported in Ta res" 
+133 and +.067. The results of the P^ 
ent study further suggest that papori e 
ratings do not correlate with grow pate 
cause they reflect the emotional the 
of the class, rather than how mue mu 
pupils are learning. Perhaps it 8 o tell 
unreasonable to expect a supervisor 
how much a class is learning just PY o 8° 
ing at it. The notion that he CaP . jg: 
seems to be based on two ase terns) 
that there is a pattern (or set of pa i 
of behavior exhibited whenever [oi the 
pupil learning takes place, and be 
supervisor can recognize this kim 
havior when he sees it. E pete?" 
The data about relationships owth ee 
classroom behavior and pupil 8" ort be 
ported in this study do not up 


T—-— —————— wr ——— 


» s A 


3 


CORRELATES OF TEACHER EFFECTIVENESS 


first assumption, at least for the types of 
growth measured here. The inductive way 
m which the three behavior dimensions 
used were obtained (Medley & Mitzel, 
Ew indicates that they probably exhaust 
A more obvious behavioral aspects in 
Which classrooms seem to differ. If there 
um oe ways in which teachers and 
"uri behave whenever the pupils are 
s in reading skill, they are not 
lated a apparent to reasonably sophisti- 
Mise assroom visitors. Raters of teacher 
tlees me must seek subtler cues than 
Hee. here is no indication here of what 
53e cues may be. 
eo ortunately, so few attempts have 
pupil “nao relate teacher behavior to 
Pee Ad (cf. Jayne, 1945; Morsh, 
clas ed & Smith, 1956) that it is not 
Dt. CMM supervisory ratings fail to 
Possible. pupil learning because it is im- 
Bets to assess learning in process, or 
ovd Supervisors in general do not know 
* T not know what to look for. These 
a ce into question the relevance of 
Search j e of a considerable body of re- 
Used he teacher effectiveness which has 
teach ngs of some kind as a criterion of 
er effectiveness. 
judge; Eu that teachers are fair 
ing Pupil their own effectiveness in teach- 
M eC. Pils to read was also reported by 
all (1959), 
et! these results, tentative as they 
an, © More questions than they answer, 
eng nt up the importance—the ur- 
lai of more research along the lines 
mmi in the report of the AERA 
defines e (cited above), research which 
enen a measures dimensions of effec- 
telata ^. 2nd of classroom behavior and 
o IA latter to the former. 
teac oe of relating behavior of 
Ou] i in effects on pupils is crucial not 
tiveness RE research in teacher effec- 
Patios its m to the future of teacher edu- 
LS 1 If the main objective of the 
à teg, i Dart of teacher education is 
fachers how to teach, it is highly 


245 


desirable (to say the least) that clear-cut 
research evidence be obtained showing how 
the teacher must teach in order to bring 
about optimum pupil growth, and that 
such findings be made a part of every 
teacher’s preparation. The amount of re- 
search, completed or underway, which can 
yield such evidence is, to repeat, astonish- 
ingly small. 

It may appear that a rather extensive 
network of inferences about teacher ef- 
fectiveness has been built from this study 
on a small amount of data—after all, only 
49 teachers were involved. It should, there- 
fore, be pointed out that the conclusions 
drawn are tentative, and are of interest 
mainly as illustrating what useful and far- 
reaching findings could be realized in more 
extensive studies along similar lines, studies 
employing criteria of effectiveness derived 
from measures of pupil growth. 


SuMMARY 


Five measures of effectiveness and meas- 
ures of three dimensions of classroom be- 
havior were obtained on 49 beginning 
teachers in New York City publie ele- 
mentary schools, and analyzed with sta- 
tistieal controls on differences between 
schools and differences between classes 
within schools. The five measures of effec- 
tiveness were found to center around two 
distinct aspects of effectiveness. Super- 
visory ratings and pupils’ reactions to their 
teachers appeared to reflect the teacher's 
ability to get along with children; teach- 
ers’ self-ratings and measures of pupil 
gains (in reading and social skill) appeared 
to reflect effectiveness in stimulating pu- 
pils to learn to read. 

An attempt was made to find out what 
kind of classroom behaviors were asso- 
ciated with each type of effectiveness. 
Neither measured gains in reading nor 
gains in group problem solving skill were 
found to be related to recorded classroom 
behaviors of teachers and pupils. Pupil- 
teacher rapport was found to be related 
to emotional climate and, probably, to 


246 


verbal emphasis in classroom behavior. 
Supervisors rated those teachers who had 
the friendliest classrooms as most effective. 
Teachers who rated themselves most ef- 
fective in teaching fundamental skills 
tended to allow their pupils less oppor- 
tunity to work in small, autonomous social 
groups. 


REFERENCES 


ANDERSON, H. M. A study of certain criteria 
of teacher effectiveness. J. ezp. Educ., 
1954, 23, 41-71. 

Bann, A. S., BEcuporr, B. V., Coxe, W. W., 
Gace, N. L., Onrzaws, J. S., REMMERS, 
H. H., & Ryans, D. G. Report of the 
Committee on Criteria of Teacher Ef- 
fectiveness. Rev. educ. Res., 1952, 22, 
238-263. 

Brooxover, W. B. The relation of social fac- 
tors to teaching ability. J. ezp. Educ., 
1945, 13, 191-205. 

Dasein, Dora. The Russell Sage Social Re- 

lations Test: A measure of group prob- 
lem-solving skills in elementary school 
children. Proceedings of the Invitational 
Conference on Testing Problems, 1954, 
75-84. 

Gorman, R. E. Personality and teaching ef- 
ficiency. J. ezp. Educ., 1945, 14, 157-165. 

Heuireiscu, A. G. A factor analysis of 
teacher abilities. J. exp. Educ., 1945, 14, 
166-199, 

Horst, PauL. Measuring complex attitudes, 
J. soc. Psychol., 1935, 6, 369-374, 


DONALD M. MEDLEY AND HAROLD E. MITZEL 


Jawwt, C. D. A study of the relationship 
between teaching procedures and Ps 
cational outcomes. J. exp. Educ, 1949, 
14, 101-134. med m 

Joxzs, R. D. The prediction of teaching 
ficiency from objective measures. J. ezp- 
Educ., 1946, 15, 85-99. i 

Kenpatt, M. G. The advanced theory of 
statistics. London: Griffin, 1948. hing 

LaDure, C. V. The measurement of ten 945 
ability: Study No. 3. J. exp. Educ, 1949, 
14, 75-100. z E 

Los, I. J. The prediction of teaching effi 
ciency. J. ezp. Educ., 1946, 15, 290. 

McCarr, A. Measurement of teacher Kn of 
Raleigh, N.C.: State Superintendent ri 
Public Instruction, 1952. Publ. No 

Meery, D. M., & Kuen, ALX Å. EU E 
ing classroom behavior with xm 057, 
reaction inventory. Elem. sch. J» 

57, 315-319. i 

Mzpiey, D. M., & Mrrzer, H. B.A technique 
for measuring classroom behavior. 
educ. Psychol., 1958, 49, 86-92. Pupil 

Mrz, H. E, & Meery, D. M ctive 
growth in reading: An index of er 48, 
teaching. J. educ. Psychol, 195% 
227-239. 

Morsu, J. E., Burosss, F. F., & Sum F 
W. Student achievement as 2 met) y- 
of instructor effectiveness. J. educ 
chol., 1956, 47, 79-88. s ing 

Mosier, C. I. Machine methods in d 
by reciprocal averages. Proceeding ation 
Forum, New York, IBM Corpor 
1946, 35-39. 


(Received June 24, 1959) 


p 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 6, 1959 


THE DEVELOPMENT OF UNDERSTANDING IN 
ARITHMETIC BY A TEACHING MACHINE! 


EVAN R. KEISLAR 
University of California, Los Angeles 


The use of teaching machines for the 
teaching of spelling and arithmetie combi- 
nations has already been shown to have 
merit (Skinner, 1954; Pressey, 1927). 
And studies have demonstrated that au- 


tomated teaching can result in more than 


Simple rote learning (Porter, 1957; Ferster 
and Sapon, 1958). In this study, the prob- 
lem was to explore the possibility of using 
a multiple-choice method for the auto- 
mated teaching of “understanding,” spe- 
cifically, an understanding of areas of 
rectangles, By understanding is meant the 
ability to answer a variety of questions dif- 
ae from those encountered during 
{ning but belonging to the same general 
oo. the broader this class is, the greater 
© understanding. 
Es sentially, the paper describes an at- 
n Pt to devise a program for the teach- 
lg understanding, together with the 
A Rd underlying its construction and 
~ nesses that were encountered in the 
D the program. No information is 
llable as to what a comparable group 
guess would have learned in regular 
Bon es. To have provided such informa- 
Sam oe have required a fairly large 
im Ple of teachers and classes. But more 
it portant is the fact that, at this writing, 
i premature to compare the teaching 
in ice approach with regular classroom 
tlle Programs for use with such 
„nes need improvement before such 
*5 ean have much meaning. 
Pbaratus 


Sti 


Th A 
Wag i teaching machine used in this study 
n extensive adaptation of the Film 


A eae 
Qe Ubreciation is expressed to the staff of 
wert ity Elementary School of the 
ij Y of California, Los Angeles, and 


n ts, E z 
P this xu e Bart for their cooperation 


Rater used by the Navy for teaching air- 
craft identification. Multiple-choice items . 
on a Kodachrome strip-film were projected 
in sequence upon a viewing plate. The 
learner responded to each item by pressing 
one of five buttoms. If the answer was 
correct a green light was turned on and 
the next item could be brought into view 
by pressing a special button. But if this 
answer was wrong a red light came on; 
only after turning off this red light could 
the learner try again. To proceed to the 
next item the learner had to answer cor- 
rectly. 

A special device recorded a graph of all 
right and wrong answers for each item. 
If the wrong answer was given the pen 
moved to the right one-twentieth of an 
inch. For each correct answer the pen 
moved vertically an equal distance to a 
new line. Hence the subject’s performance 
for any item could be read from the hori- 
zontal line on the graph corresponding to 
this item number. 

w 
The Item Set n 


The total program consisted of 120 
items, 10 of which instructed the learner 
how to operate the machine and informed 
him of the goal to be attained. The re- 
maining 110 items were constructed to 
provide a sequence beginning with con- 
cepts of squares, rectangles, length, and 
width. Following items requiring the pupil 
to indicate the number of square units in 
rectangles, the concept of aren was pre- 
sented. Applications included paint cover- 
age, rug size, and tile laying, followed by 
practical problems of adding and subtract- 
ing areas and finding the length or width 


247 


248 


of rectangles. The set concluded with items 
involving cost.” 


Principles of Programming 


Several of the principles outlined by 
Skinner (1958) and illustrated in his com- 
pletion-item set were applied to this mul- 
tiple-choice approach. To illustrate the 
principles discussed below, a short se- 
quence from the program, Items 18 
through 29, is presented in Fig. 1. The 
original program was in color. 

1. The step from each item to the next 
in the sequence should be small enough so 
that the learner almost always gets each 
item right. Although Homme and Glaser 
(1958) and Coulson and Silberman (1959) 
found that smaller steps resulted in better 
learning and took less time per step than 
larger steps, definitive evidence on this 
issue has not yet been obtained. In this 
study it was assumed that if a pupil se- 
lected the wrong alternative to an item he 
did so either because he was improperly 
selected for the program or because of in- 
adequate prior learning in the program 
itself. The programmer should make sure 
that a pupil learns enough before an item 
is presented so that generalization to this 
item is practically assured. 

In opposition to this line of reasoning 
is the argument in favor of higher item 
difficulty that the learner is encouraged to 
“formulate his own hypotheses and try 
them out.” This procedure may have merit 
if information is supplied to the student 
showing why the alternative is incorrect 
(Crowder, 1958) or if branching in the 

program permits special remedial instruc- 
tion. But where, as in this study, the pupil 


2This program, consisting of a Koda- 
chrome strip film of 120 frames, has been de- 
posited with the American Documentation 
Institute. Order Document No. 6080, from 
ADI Auxiliary Publication Service, Library 
of Congress, Washington 25, D. C., remit- 
ting in advance $200 for microfilm, $1625 
for 35-mm. enlargement prints. Make checks 
payable to Chief, Photoduplication Service, 
Library of Congress. 


EVAN R. KEISLAR 


is informed only that he is wrong when he 
makes an error, he is given no more infor- 
mation than when he gets the item correct. 
One fifth grade subject, after completing 
the program in this study, commented, 
“It’s hard to know why you get something 
wrong. When I got it right, I knew it. 
When I got it wrong, I didn’t know why- 
Since the absence of an explanation is likely 
to heighten the aversive consequences O 
failure, it appeared most desirable to 
adopt a minimum difficulty level for the 
items in this study. If it seemed helpful to 
emphasize that certain responses Wet? 
wrong, instead of having the student learn 
this by “being wrong,” special items were 
constructed for this purpose, €.£.; NS 
of the following figures is NoT à rectangle a 
2. Skinner’s use of the vanishing gtimu- 
lus and the prompt facilitates the bs 
currence of the correct response. " 
example, in Items 25 and 26 colored 0" 
of squares encourage the right answer: p 
Item 27 these become merely dotted i 
which vanish completely in Item 28. $ 
Item 20 the correct answer “1 ft dhe 
prompted by the dimension “1 ft.” iP 
diagram. „hin 2 
3. To promote generalization jia 
broad class of items, a process eq 
for understanding, the learner shoul hich 
quire a variety of verbal responses pe 
might later be used, through inta. 
associations, to evoke other approp’ of 
responses. This is simply & pe 
mediated or secondary generalizat0"" : 
complex example of which is pape 
Judd's theory of transfer through k that 
principles. For instance, it was judge d 
if the pupil learned to grouP und 
within a rectangle by rows, as 1n ix "m 
through 28, he would acquire intro" 
associations which would promo in 
correct response in learning late 
program to multiply the length 
width to find the area and, still 
divide the area by the length tO 
width. In other words, with t 
the pupil is being prepared t 


aten he 


p 
tem 


= 


Pio 
. L 
A SauprE Sequence or Irems, Nos. 18-29, TO Iu 


TEACHING UNDERSTANDING BY A TEACHING MACHINE 249 


Squares can be large or small. But af cach 

side on a square is one foot long, this is called 

square foot. Which figure is one square foot? 
2n 


du 


= ft 


Here are wo squares; C] CI 


T 
Sr are exactly the same size. Suppose I put 
quares together side by side hike this. 


T 
anat kind of a figaFe will the two squares 
ieee er make if 1 leave out the middle linc like 


A. Another bigger square. 


B. A rectangle. 
Here is 
foot ate eee It is 2 feet long and 1 
, in. 


Suppose | 
e I draw à line in 
he Paddle, like this; 1 " "s 
ion make 2 squares, How 4 
8 will cach side of cach square be? 


^. Itt, C. Sit, 


B. 2ft. 


Sup; 
po 
ong ang eu have a rectangle which 18 3 feet 
1 foot wide dike this 
Sit 


166] m 


How 
man 
V make qi Squares, 1 ft, long on cach side, can 
ut of this rectangle? 


^. One 
B. Two 
C. Three 


How 

m 

18 two ied Square feet are in a rectangle which 
t high and one foot wide like this: 


an. 


A. 
ne C. Three 


B. 
Be D. Six 


How 
ma, 
ie 2 fog, Y Square feet are in a rectangle which 
wide and 4 feet long? 


Four 
Six 


* Eight 


24, 


25. 


28, 


Each side of this square is one inch 


long. This is called one square Li 
inch. How many square inches are in 


this rectangle which is one inch wide and five 
Sin. 


inches long? 
A. One D. Four 
B. Two E. Five 


shot 


Here is a rectangle which is 2 inches wide and 
4 inches long. How many square inches are 
there in this rectangle? 


HH 


There are 2 rows with 4 squares in each row. 
How can we write the total number of square 


C. Three 


inches? 
A. 2+2 C. 2444244 
B. 2*4 D 4+4 


This rectangle is 4 inches wide and 7 inches 
long. How many square inches are in thi 


rectangle? 


There are 4 rows with 4 squares in each 


row. 
There are 8 rows with 4 squares in each 


row. 
There are 4 rows with 7 squares in each 


row. 
There are 4 rows with l4 squares in each 


row. 


This rectangle 15 5 inches long and 3 inches 


c. 


wide. 


Bin. 


There are 3 rows of squares. In cach row there 
are 5 square inches, How many square inches 


are there in the rectangle? 
A. 545 


B. 5*5«45 


C. $*545*5 


Here is a rectangle which is 5 in, wide and 
gin. long. How many square inches are there 


in the rectangle? 


A. 999 #94949 Be StS 6554S 
D. 545494909 
A rectangle which is 5 in, wide and 9 in. long 


contains 9 494.94.9 +9 square inches. What 
is a quick way of finding how many square 


inches this 15? 


Cc.9*5*9*5 


A. Add 9 and S. 


R. Add 9 and 9 and 9 and 9 and 9. 


c. Multiply 9 by 5 
D. Divide 9 hy 5. 


USTRATE Pi ROGRAMMING PRINCIPLES 


250 


sight into why" you multiply in one case 
and divide in the other. Appropriate in- 
traverbal associations, such as verbal prin- 
ciples, definitions, or characteristics, should 
function to extend the pupil's learning to 
entirely new items which involve the 
same principle or concept. If this indeed 
can be accomplished, the use of multiple- 
choice items in automated teaching re- 
sults in something more than “mere recog- 
nition" of the right answer. 

4. Procedures or concepts which are not 
otherwise involved in the sequence of items 
should be reviewed periodically. For ex- 
ample, a review of the process of grouping 
squares into rows, originally presented in 
Items 25-28, was provided later by six 
items occurring at intervals throughout 
the program. 

5. Other techniques included the repe- 
tition of the correct answer on the suc- 
ceeding item (as in Item 29), the irregular 
appearance of "interesting" colored pic- 
tures accompanying the item, and the use 
of a variety of forms of the multiple- 
choice item. 


Subjects 


Fourteen experimental Ss and 14 con- 
trols, individually matched on the basis of 
intelligence, sex, reading ability, and pre- 
test scores, were selected from the fifth 
and low sixth grades. All Ss showed compe- 
tence in multiplication and division but 
little acquaintance with the topic of area. 
A fifteenth S who had completed the pro- 
gram was dropped from the study because 
of an automobile accident prior to the 
posttest. The control Ss were given no 
special instruction of any kind. They were 
used to control for the effects of incidental 
learning such as that which might result 
from the administration of the pretest. 


Pretest and Posttest 


Both of these tests were of the free- 
answer or essay type. The pretest consisted 
of 12 problems involving multiplication 
and division, in addition to 8 problems 


EVAN R. KEISLAR 


dealing with areas of rectangles. The post- 
test consisted of the same eight problems 
of the pretest on area plus another eight 
most of which were more difficult. Sample 
problems on the posttest were: 


1. A gallon of paint will cover an area T 
200 sq. ft. How long a stretch of fence coul 
you paint with 2 gallons if the fence 15 
Íeet high? 

2. A sheet of cardboard is 3 feet long. 
It weighs exactly 4 ounces. What is the area 
of this cardboard if it is 2 feet wide? T 

3. A sheet of paper is 10 in. wide and is 
in. long. This sheet of paper is cut n 
Strips. The strips are laid end to end an! 
then joined with Scotch tape. This long 
strip now looks like this: 


| LÀ 


f this 


Can anyone tel what the area © 
long strip is? ...... 
If so, what is it? 


Procedure 


Experimental Ss operated the machi 
for two or three periods on successive ra ó 
The total time spent with the machi 
ranged from one hour and 30 minutes a 
slightly over two hours. The posttest ad 
given the day following the end of ec 
machine instruetion to each experime? 
S and his control. 


Results 


The mean score of the exper 
group on the posttest was 12.4, 
standard deviation of 5.6. The corres?’ 
ing control mean was 5.4, with a sta” ore 
deviation of 3.7. Since the posttest 5^ uch 
for the experimental group showed 2 ol, 2 
greater variance than did the cont" the 
sign test was used. All except one <tvest 
experimental Ss showed a higher pe W 
score than did their matched cont? "o 
difference which is significant at * rouP 
level. Although the experimenta a 
answered every item, except on grouP! 
posttest better than the control | jte 
most of these pupils missed seve" 


imental 
with a 
pond: 


TEACHING UNDERSTANDING BY A TEACHING MACHINE 


ERRORS 
40 60 80 


700: 
CC 
an 
E 
2 READING GP.6.6 
ze PRE-COMP. 8 
ui PRE-AREA 1 
E POST AREA 5 
TIME Ques 
ERRORS 118 


F 
P IG. 2. GrapHIC RECORD or THE POOREST 
ERFORMANCE, 


ERRORS 


20 40  óO 80 10 


S KENNETH 
tns GRADE A5 
= CTMM — dn 
=60 READING GP. 5.6 
= PRE-COMP ff 
E PRE-AREA 2 
POST AREA 13 


TIME 1HR 45MIN 
40 


è 


ERRORS 


bons 3. Grapuic Recorp or A TYPICAL 

whi 
E ees similar to those presented in 
What h gram; they learned far less than 

h ad been expected. 

Tom Pre errors on the program ranged 
record pe 118 with a mean of 54. Jane’s 
est yo ican in Fig. 2, was the poor- 
learnin ormance and indicates very little 
9f her E as measured by the posttest. Part 
ability problem may have been inadequate 
tion so In computation, since her computa- 
ore was relatively low. Even for a 


251 


ERRORS 


BY RON 

GRADE AS 

ec) CTMM 138 
Ej READING G.P 16 
z” PRE-COMP. 11 
2 PRE-AREA 2 
Ty POST AREA 29 
A TIME 1HR SIMIN 


* 


ERRORS 3 


[) 
Fic. 4. Gmareuic Rrcogmp or THE Best 
PERFORMANCE. 


typical student like Kenneth, whose rec- 
ord is shown in Fig. 3, the program was 
too difficult; 40 mistakes is entirely too 
many according to the criterion adopted. 
The program appeared to be ideal for 
Byron, whose record appears in Fig. 4. Al- 
though he showed little ability in the 
field of area on the pretest, Byron’s per- 
formance on the posttest was outstanding. 
He was able to generalize from his train- 
ing so well that on the posttest he solved 
completely new problems of obtaining the 
area of a parallelogram and a triangle. 

The rank order correlation between total 
number of errors on the program and the 
gain on the posttest was —.83. While this 
of course supports the hypothesis that the 
optimum difficulty level of each item 
should be low, it does not permit, in itself, 
any such conclusion. The rank order cor- 
relation of mental age was 52 with gain 
on the posttest and —.79 with number of 
program errors. On the basis of this lim- 
ited sample it appears that the program 
was more appropriate for the brighter 
children. 
Desirable Revisions in the Program 

In the absence of definitive evidence on 


it appears that the major 


the question, 
as that it was 


weakness of this program W 


252 


too difficult for most of the pupils. Re- 
visions should be made along the following 
E Since the reading load was probably 
a major obstacle for many pupils, sen- 
tences should be shorter and the total 
amount of reading less for each item. Pos- 
sibly several versions of the program, at 
different reading levels, would be desirable. 

2. The steps in many if not most cases 
could be smaller. For example, Item 22, 
which was missed by three pupils, could 
be preceded by items analogous to 19 and 
20. Item 24, also missed by three pupils, 
could be divided into two items, the first 
introducing the concept of square inch 
only. Item 25 could be rewritten as two 
or even three items. 

The greatest misapplication of the prin- 
ciple of small steps occurred in the latter 
part of the program. For the group of 15 
pupils who performed the item set, on the 
first 10 items there was a total of five 
errors on the first attempt (out of a pos- 
sible 150). On succeeding sets of 10 items 
this number of first-attempt failures in- 
creased until it reached 66 for the last 
set of 10 items. The relatively high diffi- 
culty of the items in the latter part of the 
program appeared to result from the fact 
that these more complex items required a 
diversity of other understandings, abilities 
which were not tested for in the pretest. 
For example, although pupils were se- 
lected on the basis of their ability to 
divide, many failed to relate division to the 
process of successive subtraction. The 
program failed to provide adequate intro- 
duction or review of these concepts. 

Faced with this type of problem a pro- 
grammer can either use a more adequate 
pretest to provide for better selection or 
he can add the additional items as re- 
quired. When a great variety of item sets 
in arithmetic become available, the prob- 
lem largely disappears. The prerequisites 
to any item set can be stated in terms of 
the successful completion of previous item 

cord obtained for 
sets. Conversely, the re 


EVAN R. KEISLAR 


a pupil with one item set might be used 
diagnostically to indicate what the next 
set, remedial, optional, or otherwise, might 
be. 

3. A wider variety of items should be 
used for each new process. For instance, 
Items 25-28 should be supplemented with 
items which provide verbal statements 
as alternatives and more familiar illustra- 
tions of the problem. As another example, 
on many of the posttest questions asking 
for an area, Ss wrote only the correct 
number failing to indicate the square units 
involved. Although six items had been 
presented in the program to reduce this 
kind of error, these items unfortunately 
were all stated in exactly the same = 
The item stem called for an area an 
among the alternatives one dieitáio 
listed only the correct number. Although 
completion items may be necessary ad 
teach this type of behavior, better result: 
in this program could probably have bec? 
obtained if, instead of the single form; 7 
variety of multiple-choice forms had beg 
used, e.g., use of “none of these” 28 p 
alternative, asking “What is left out ” 
this answer?" or asking for the approP™ 
ate rule. 


Conclusion 


The use of multiple-choice items in RS 
mated teaching appears to have some h- 
fectiveness under these conditions in ne 
ing understanding, as herein defined, 3 
though the criterion was a free-answeT how 
Although the average pupil did not $ he 
as high a degree of competence 0? also 
posttest as expected, the program vain? 
far more difficult than intended. hip 
there appeared to be a strong relation’. 
between success on the program ja 
on the posttest, before the limitations od 
advantages of the multiple-choice met? a 
used in this study may be assess? clude 
program should be revised tO set 
smaller steps and a greater Var? p the 
items. Whether the best performers 0^ gre 
present program would have learne 


" Me E 
" MM a au 
— — — — 
T -———————"—- 
0 0 ——————— —————EN 


TEACHING UNDERSTANDING BY A TEACHING MACHINE 253 


from such a longer and simpler revision 
Temains to be determined; to accommo- 
date individual differences in ability two 
Or three versions of the program may be 
desirable. 

SUMMARY 


Fourteen elementary school pupils re- 
Sponded individually to a set of 110 multi- 
Dle-choice items in a teaching machine. 

he performance of each child was graphi- 
cally recorded. Subjects performed signifi- 
cantly better on a test of understanding of 
Areas of rectangles than did their matched 
Controls who received no planned instruc- 
tion on this topic. The principles of pro- 
ramming are discussed and illustrated. 
Suggestions given for the revision of the 
Program, which appeared to be too diffi- 
cult for most pupils, include the introduc- 
ae of smaller steps and a greater variety 

types of multiple-choice items. 


G REFERENCES 
OULSoN, J. E, & Superman, H. F. Results 
of tnitial experiment in automated 
teaching, Santa Monica, Calif.: System 

evelopment Corp., 1959. 


Crowper, N. A. Automatic tutoring by- 
means of intrinsic programming. Paper 
read at the Air Force Office of Scientific 
Research and the University of Pennsyl- 
vania Conference on the Automated 
Teaching of Verbal and Symbolic Skills, 
Philadelphia, December, 1958. 

Ferster, C. B., & Saron, S. M. An Applica- 
tion of recent developments in psy- 
chology to the teaching of German. 
Harv. educ. Rev., 1958, 28, 58-59. 

Howe, L., & Guaser, R. Relationships be- 
tween the programmed textbook and 
teaching machines. Paper read at the 
Air Force Office of Scientific Research 
and the University of Pennsylvania 
Conference on the Automated Teaching 
of Verbal and Symbolic Skills, Phila- 
delphia, December, 1958. 

Porter, D. A critical review of a portion of 
the literature on teaching devices. Harv. 
educ. Rev., 1957, 27, 126-147. 

Pressey, S. L. A machine for automatic 
teaching of drill material. Sch. Soc., 1927, 
25, 549-552. 

Sxryner, B. F. Science of learning and the 
art of teaching. Harv. educ. Rev., 1954, 
24, 86-97. t R 

SxixxER, B. F. Teaching Machines. Science, 
1958, 128, 969-977. 


(Received July 6, 1959) 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 6, 1959 


TRANSFER FROM CONTEXT BY SUBTHRESHOLD SUMMATION’ 


GEORGE M. HASLERUD 
University of New Hampshire 


Transfer needs a more adequate model. 
None of the three common paradigms of 
transfer, very well outlined by Vanderplas 

(1958)—the Osgood surface, generalization, 
and mediating association—has yet solved 
the crucial problem of similarity. Moreover, 
there is yet no evidence that these para- 
digms can be applied fruitfully to situations 
more complex than the isolated motor or 
verbal situations where they originated and 
have been tested. Is there a possibility 
that these models have concentrated on 
problems which may be only tangential to 
the phenomena of transfer occurring in 
such out-of-the laboratory contexts as al- 
lusion, analogy, and relevance? 

The twin problems of similarity and 
meaning both turn upon how a new situa- 
tion is perceived by the subject (S). As 
soon as the relationship is perceived or the 
point is seen as similar to something known 
before, then the remainder of the learning 
task is no longer transfer but a routine or- 
ganization of the details and an attainment 
of skill enough to meet the criterion. In 
other words, the "transfer" could be ex- 
pected to manifest itself only in the first 
meeting or two with a new situation, as it 
does in the Gestalt insight problems, but its 
influence would largely be masked when 
total trials to criterion is the comparative 
measure. 

A perceptual model of transfer based on 
context would, therefore, seem a natural 
solution. But before such a model can be 

formulated, experimental tests of percep- 
tual behavior in substantial contexts must 
be made. 

Most studies of context have not gone 
beyond a single synonym as context for a 


i " ted by a grant 
1 This research was suppor’ 
ues Central University Research Fund 
of the University of New Hampshire. 


test word (Cofer & Shepp, 1957), or the 
beginning of an associative chain (Russe 
& Jenkins, 1955), or pre- and post-alterna- 
tive words for a picture exposed tachis- 
toscopically (Lawrence & Coles, 1954); M 
cluster of words in the preexposure fiel , 
meaningful or nonmeaningful, depending 
on the relation to the subsequent stimulus 
word (Cohen, Gofstein, & Casey, 1959). 
Some found interference from contexts like 
these, while others found facilitation, = 
pecially if the relationship was close em 
& Shepp, 1957). A special kind of context» 
familiarity of a word as evidenced by € 
nition of its definition, was found to ph 
perception of nine-letter words exposed 
40 ms. (Haslerud & Clark, 1957). a 
The studies of more complex context d 
very limited in number. In 1954, ae ee 
perimenter reported at the annual Ier 18 
of the Japanese Psychological Associntio ; 
pilot study of his half-story tech i 
which he had the collaboration of Koji aT 
and graduate students, M. Akita ar pese 
Kambe, of Kyoto University? Jap? the 
kindergarten children transferred tO ing? 
second half of an illustrated story ppm 
and inferences from the first half a8 L^ the 
by contrast with those who met oP 7 ally 
second half. If the story experime? 
reached an apparent ending after a the 
half, however, little was transferred The 
unexpected second half the next day- gsi- 


o 
study was only suggestive of te oopiné 


bilities because of such problems 8: ansiet 
a straight learning approach E pgi? 
from becoming merely rote. Searc 2 g of 


e 
bypass of the difficulties led to the Such a 
a direct perceptual test of transfer- 


in 19 
P in 
? The extensive pilot experiments enta], 
were made possible by the SupP ^j, th 
Educational Allowance prov! emission 
United States Educational Com 
Japan (Fulbright). 


254 


"DA e 


TRANSFER FROM CONTEXT 


E been done by Miller (1956) with 
iu aed exposure of .15 sec. He found 
oe Se were more readily per- 
conii eire dard when the succes- 
Er ences told a story than when they 
a random order. But the continuity 
Bom been attenuated by this frag- 
dd x parceling out of meaning, espe- 
pes a very simple story for adults. In 
E : e best record was by an S who did 
pee on the random presentation. Evi- 
meds more complex, meaningful, and 
"x mur a is needed if one wishes 
Da he ephemeral, everyday trans- 
see, pe of this experiment is to 
e and he transfer effect of such a con- 
mtd" te subsequent phrase presented 
ted S v The first hypothesis to be 
acilitnte d hat the preceding context will 
ios Do he perception of the continuation 
t s s Second hypothesis is that con- 

ote Ich is relevant will facilitate, while 

hich is irrelevant will not. 


PROCEDURE 


er to meet the need for meaningful 
| mineur in sufficient number and 
est, is do un enough to allow statistical 
Unlike th, use the classical limerick which, 
ferent en Lear variety, has a last line dif- 
interest om the first. This material is of 
On s most Ss, usually has a high 
last ting n frequency count, and while the 
n the d no words identical with those 
fingit, es four Jines, it does have a con- 
arfetehed meaningful (even if sometimes 
thyme - relationships, rhythm, and 
- From Reed (1925), 18 unfamiliar 


li 
Nericks y 
Bin. S Were chosen whose first lines be- 


A mai 
oe agerie a at college, named Breeze; A 
who Pan of TUE to our place; A thoughtful 
A 9 said. Th ahore; There was an old lady 
pla Bic of m ere was an old bore of Torbay; 
lags, m uch savoir-faire; Some amateur 
Sans Of SS brave; There was a young 
The’? Ther g; A pony, renounced for his 
fre Was e was a young lady of Florence; 
a young lady named Bright; An 


255 


epicure, dining at Crewe; There was 

lady of Kent; There was a young pep 

Kew; There was a young man who said 
Damn"; A goddess, capricious, is Fame; 

There was a young lady named Kate; There 

was a bad schoolboy who baited. 


Each S had the characteristics of limericks 
recalled to his attention by preliminary 
analysis of the familiar limerick, “There 
was a young lady of Niger,” and the process 
of anticipating the fifth line, by the un- 
familiar, “There was a young girl of Na- 
varre." 

The following procedure introduced each 
limerick: The S was instructed to accentu- 
ate rhythm and rhyme as he read aloud 
twice the typed first four lines. Following 
the suggestion to the S that he anticipate 
the fifth line, the E said, “In the tachis- 
toscope you may see the last part of the 
fifth line of the limerick, or it might be 
something else. But report anything you 
see, even if it seems incomplete.” For pre- 
sentation in the tachistoscope at 24 in., the 
last half-line (usually three words) was on 
a separate sheet ink-lettered one-eighth in. 
high at fixation level and to the right of 
center of the midline. 

A Harvard-Dodge tachistoscope had its 


pre-exposure field set at 40 sec. and the ex- 
or stimulus material deter- 


posure time f 
mined for each S. The limens for seven 


preliminary three-word endings of sen- 
tences similar to the stimulus materials 
were obtained by increasing the exposure 
.01 sec. each trial until the S 
could report perfectly. Then three-fourths 


of his lowest time was set as the unvarying 
exposure interval for the experimental last 
half-lines, with trials repeated every 10 sec. 
The Ss were not informed that all exposures 
were the same, and their inference that the 
time was being lengthened was fostered by 
E’s ostentatious adjustment of an irrele- 
vant dise of the apparatus while he said, 
“Let's try this now.” At this subliminal 
level, about five exposures at the 10-sec. 
separation are needed for unrelated three- 
word phrases to reach threshold level, and 
a facilitating factor has room to indicate a 


time by 


256 


quicker response. If the S did not give a 
perfect report in 10 trials, the E proceeded 
to the next limerick. If the three-fourths 
formula proved inadequate for an S as 
shown by wide variation from five repeti- 
tions for unrelated material, the constant 
was changed. Out of 80 Ss, two required 
lowering to .6 and two required raising to 
9. 
Although temporal and areal summation 
have a long history in neurology and physi- 
ology, this experimenter has been unable to 
find any previous perceptual studies of 
comparative stimulus thresholds which 
present an unvarying subliminal stimulus 
at equally spaced intervals until sufficient 
exposures permit a correct response. The 
method of this experiment was empirically 
worked out only to the point where it was 
sensitive, economical of time, and less 
fatiguing for E and S than more conven- 
tional psychophysieal methods. The com- 
parisons of frequencies of exposures for two 
conditions allow the testing of each of our 
two hypotheses, but a research is planned 
to study systematically the relation of 
thresholds reached by this kind of summa- 
tion with those reached by progressive 
lengthening of exposure for successive 
trials. 

The Ss were 80 college students divided 
so that 20 male and 20 female students 
were used to test each of the two hypothe- 
ses. Because of the hour to hour and a half 
required for the individual testing of each S 
and to improve the meeting of appoint- 


TABLE 1 
SUBLIMINAL EXPOSURES TO REACH THRESH- 
oLD FOR FrrrH Last HALF-LINE or NINE 
LIMERICKS PRECEDED BY Own Four 
LINES AND OF NINE PRECEDED BY NONE 


(Each S was own control) 


38 

a a| 2] = ape 

a 2lz| 2 EIS 
à z [v] Ii As] x ES a 

4,28|5.61| .28 |4.75) «.001| 18 

4 2 3.71]4.69| .23 4.96|«.001| 18 


GEORGE M. HASLERUD 


ments, the Ss were paid for their coopera- 
tion. . 

Each S as his own control had nine lim- 
ericks in one condition and nine in the other, 
e.g., nine relevant and nine irrelevant. As a 
further control, each limerick was presented 
in both conditions, e.g., 20 Ss had it in the 
relevant condition and 20 in the irrelevant. 
The same was true for the group that teste 
the influence of presence or absence of 
previous context. For the relevance-irrele- 
vance study, the fifth lines were randomized 
for each S by a table of random numbers, 
but for presence-absence, the two condi- 
tions were alternated. 


RESULTS 


For each S the data consisted of the 
number of trials to attain a perfect response 
in each of the two conditions. Because "A 
S was his own control, the difference ; ; 
tween his scores includes the correlation 
factor for related samples. The significa?" 
of the mean difference score from hal 
difference was found. Table 1 indicates j 
the perception of the last half-line O5. 
limerick required significantly fewer P 
liminal exposures when preceded by her- 
first four lines than when alone. Furti H 
more, as can be seen from the right be 
column, 36 out of 40 Ss individus ot 
showed this same direction, with the 9 
4 either exhibiting no difference oT aie 
slight difference in favor of the aw ng 
line. Hypothesis I, that preceding Me? sub- 
ful context has a facilitating effect ie ; 
sequent perception, is thus well r » 

But not every context will give © 
ceptual advantage. As proposed 17 
pothesis II, the preceding context 77 
relevant to the exposed half-line. I? 
one can see that relevant endings © jro? 
ceived significantly faster than 
vant. Also the tendency was tru did 
the 40 Ss individually, well beyo™ 
expectancy (p<.001). 

Since the limericks were bal x 
half the group had, e.g., the relev nto p 
tion and the other half the iT? 


e for 


t 
tbe 
anced £0, tue 


TRANSFER FROM CONTEXT 


Comparison can be made for limericks just 
as has been done in Tables 1 and 2 for 
ri The results are not given in detail 
vds S pn except to say that for both 
E a d groups five-sixths or more 
e ericks were in the expected di- 
ome are no sex differences in Table 2 
Table Lo HUS conditions), but in 
Ted D» apparent that the males were 
s m y (p«.01) slower than the fe- 

he en there was no preceding context. 
rore es in Table 1 did not differ when 
une : Se present from males in Table 2, 
tion LE was the similar relevant situa- 
(S Mi eg differences between absence of 
E irrelevance were significant at 
etie of the relevant score to the 
Bins = pn of the relevant plus irrelevant 
combinats of the score with context to the 

dia era of context plus no-context 
fie, S ribute themselves in a normal-like 
The a with mean at .44 and SD of .08. 
ratios ionis of Ss differing in these 
n nave not been investigated, but they 
abilit give further clues to the nature of 

Y to transfer. 


Discussion 


B 

hi hypotheses were supported at a 
i iut nfidence level. Variations in the 
the gg Y of the limericks and variations in 
inne to catch the meaning to be 

; Ired were cancelled out as factors by 

E limericks and Ss their own con- 
So, the number of limericks and the 
of Ss were sufficiently large to give 
Ee results and highly significant p's. 
of gy ch adequate controls for the testing 
for , Pothesis IT, the conclusion, at least 


trols, 
“Umber 
Consiste 


15 
` Conte college population, that relevant 


xt a 
Pepteq. facilitates perception can be ac- 


Wh; 
Poe. = major controls for testing Hy- 
> the ex were the same as for Hypothesis 
ehea a setting for Hypothesis 
8 four Ji the warm-up from reading the 
Ines of the limerick in the “None” 


257 


TABLE 2 
SUBLIMINAL Exposures ro REACH THRESH- 
OLD ron FirrH Last HALF-LINES OF 
LIMERICKS WITH NINE RELEVANT AND 
NINE IRRELEVANT PRIOR CONTEXTS,OF 
Four Lines ie 
(Each S own control) 


Expected 
Direction 


Group 
Ss in 


pe | tee | et S a ee 


M | 20 [8.75/4.68) .22 |4.23/<.001| 16 
F |20 /3.52/4.51) .23 4.81|« 


A 


situation. Examination of Tables 1 and 2 
would tend to discount the importance of 
warm-up as an explanation of the difference 
in exposures between presence and absence 
of context. The female students give the 
same results in Hypothesis II where it was 
controlled as in Hypothesis I where it was 
not. While the males in the similar “lim- 
erick” and “relevant” situations in Hy- 
potheses I and II, respectively, had a dif- 
ference in the means which was far from 
significant even at the .05 level, they needed 
significantly more exposures (p «.01), when 
given no preceding context, than the males 
for Hypothesis II where the context was 
irrelevant. A possible explanation for the 
males’ slow perception of the isolated 
stimulus phrase may be the need for more 
cues to be summated when language re- 
sources are restricted, as has often been 
reported for males compared to females 
(e.g., Haslerud & Clark, 1957). 


CONCLUSION 
can one infer is transferred 


from context to facilitate the perception of 
the last half-line of this experiment? The 
non-Lear limericks used do not have identi- 


cal nor even similar words but only rhyme, 
ect them to 


thythm, and meaning to connec! 
the previous context. The pursuit of rele- 
vance is probably like the narrowing of 
alternatives from familiarity with a word 
(Haslerud & Clark, 1957) or the presenta- 
tion of limited alternatives (Lawrence & 


What, then, 


258 GEORGE M. 
Coles, 1954) in its effect on the readiness to 
be stimulated in a particular way. The 
anticipative set established by meaningful 
context apparently makes fragmentary 
stimuli sufficient when they fit, as for a 
rhyme. That the set is a projective anticipa- 
tion seems plausible when one considers 
the negative cases, e.g., where the S attains 
a goal and thereafter transfers little (Hasle- 
rud, 1950) or the similar case of the passive, 
noninvolved S (Haslerud & Meyers, 1958). 
From the present experiment some of the 
broad outlines of a perceptual theory of 
transfer can be surmised. But to complete 
the foundation of the theory and to deduce 
its practical implications, much more work 
will have to be done. 


Summary 


An experimental base for a perceptual 
model of transfer requires testing of two 
hypotheses: that preceding meaningful con- 
text will facilitate perception of a continua- 
tion phrase and, secondly, that the context 
must be relevant. Eighty college students 
of both sexes were shown individually in a 
tachistoscope the last half-lines of 18 non- 
Lear limericks. The exposures were set at 
34 of each S's stimulus threshold for simi- 
lar material, with trials repeated every 10 
sec. until a perfect report. The context was 
the first four lines of the limerick read 
aloud twice by the S. Each S acted as his 
own control for the two conditions needed 
to test one of the hypotheses, and each 
limerick was similarly balanced within each 
group for the two conditions. Forty Ss 
were used to test each hypothesis. Both 


. HASLERUD 


hypotheses were supported at p<.001 in 
the groups and with 8095-9095 of the 
individual Ss in the expected direction. The 
results point to a readiness for even sublim- 
inal cues when a projective anticipation 
has been established by relevant context. 


REFERENCES 


Corer, C. N., & Supr, B. E. Verbal con- 
text and perceptual recognition time. 
Percept. mot, Skills, 1957, 7, 215-218. 

Comen, M. L., Gorsrgis, A. G., & CAN 
T. M. The effect of a meaning set s 
visual recognition thresholds. Pap i 
read at Eastern Psychol. Ass, APT 
1959). f Nes 

Hasterup, G. M. Properties of bi-dir d 
tional gradients at + o'er J. gen 
Psychol., 1950, 29, 67-76. 

ansa G. M., & Cranx, R. E. ome 
redintegrative perception of WO 
Amer. J. Psychol., 1957, 70, 97-101. ari 

Hasterup, G. M., & MEYERS, Bu 
The transfer value of given and adiit 
vidually derived principles. J- 
Psychol., 1958, 49, 203-298. 

Lawrence, D. H., & Conse G je € 
of recognition with alternativ l 
and after the stimulus. J. exp. Psycho» 
1954, 47, 208-214. 

Miter, E. E. Context in the paren 
sentences. Amer. J. Psychol., 195% 
653-654. ow 

Rer», L. The complete limerick book. N 
York: Putnam, 1925. y. Min 

RussELL, W. A., & Junxins, J. J- $ (Con- 
nesota Tech. Rep., 1955, No. 1 
tract NSONR-66216). bind and 

VaupznPLAs, J. M. Transfer of pinu nd 
its relation to perceptual je 375- 
recognition. Psychol Rev., 1958, °°" 
385. 


curacy 
before 


tion of 
, 


(Received April 8, 1959) 


a 


JounNAL or Ep 
UCATIONAL Ps 
Vol. 50, No. 6, 1959 'YCHOLOGY 


ADULT AGE DIFFERENCE IN PERFORMANCE 
ON A VISUAL-SPATIAL TASK OF 
STIMULUS GENERALIZATION! 


FRANKLYN N. ARNHOFF? 
Mental Health Research Unit, Syracuse, N. Y. 


N ocn investigations of the learning 
iz performance of elderly subjects (Ss) 
Hm ee of situations have tended 
ör arte that older persons do not learn 
he E rm as well as younger Ss, although 
Widely ects of chronological age differ 
Nhi y “ae different tasks (Welford, 1958). 
mice UL ABMEPRAK and slowing of 
Ene eno have been reported fre- 
; ed or older Ss (Welford, 1958), the 
may feu HAND divergent age groups 
V. quite small. 

us n (1954) hypothesized that stimu- 
» Lew s would be less in aged 
is pre ee to younger persons, basing 
Older te on the concept that the 
S son narrows his range of interests 
ignorin en on essential activities, 
Scious ah he rest. Discussing the uncon- 
of aging se that are the concomitants 
the agin, einberg (1956) generalized that 
nergy k organism, with lessened psychic 
exclude th deal with all stimuli, begins to 
Occurs, qq em. If such exclusion of stimuli 
Would. Um: the performance of aged Ss 
PEeetetion Dro stimulus bound, with the 
zation, H n of decreased stimulus generali- 
alization ee a study on stimulus gen- 
(1956) p and age by Smudski and Braun 
the Sti id not demonstrate differences in 
between ulus generalization performance 
thay pa Young and old Ss, findings which 
Vation Ielated to Welford’s (1958) obser- 
“Parent at, since “easy” tasks make little 
: demand upon organically based 


P 

fera, Pe: 
Mete, ped at the Interhospital Con- 
fhta] bod York State Department of 
Lo Grateg[ ones Syracuse, N. Y., April 1959. 
d and Ts, scknowledgement is due Irving 
dye stiong abel McCaffrey for their helpful 
fg, Ses and statistical advice, and to the 
coo bers of the Wagon Wheel 
beration in serving as subjects. 


capacities or past experience, people of all 
ages would perform about equally well on 
such tasks. 

The purpose of the present study was to 
contrast the performance of old and young 
Ss on a visual-spatial task of stimulus 
generalization, specifically hypothesizing 
that while the young and old would have 
similar patterns of stimulus generaliza- 
tion, the errors would be consistently fewer 
for the older than the younger Ss. 

Experiential and practice factors have 
been emphasized in comparative age stud- 
ies as it is assumed that the older person 
will be under some handicap in novel situ- 
ations which interfere with prior learning 
(Ruch, 1934; Korchin and Basowitz, 
1957). Because of these factors a learning 
task was selected to be relatively simple 
in its psychological and motor demands, 
to have relatively simple task instructions, 
to require relatively little memory, and to 
be apparently free from past learning or 
experience. While findings from such a 
specific task may be difficult to generalize 
to other and more practical learning situa- 
tions, it may lead to greater understanding 
of basic differences between age groups. 


PROCEDURE AND SUBJECTS 


The task was essentially that of making 
choice reactions on an apparatus modified 
from that of Brown (1951) and used sub- 
sequently in other studies (Bilodeau, Brown, 
& Meryman, 1956; Amhoff & Loy, 1957). 
On a 3’ X 14" plywood panel, painted flat 
black, seven 4-watt, 6-volt lights were 
mounted horizontally, equally spaced 4 in. 
apart. A red neon pilot lamp, 2 in. above the 
center lamp served as & fixation point and 
as a ready signal. The pilot lamp was turned 
on in random order, 3 to 5 sec. before the 
lighting of 2 stimulus light. S sat 5 ft. 
away holding reaction button in the pe 
ferred hand. Response latency was measure! 


259 


260 


earest 01 sec. by means of a Stand- 
MAREA Timer which was wired to begin 
timing with the lighting of a stimulus light 
and to stop when S reacted. by pushing the 
reaction button. The experimenter (E) sat 
behind the panel. By means of switches, 
stimulus lights could be turned on in pre- 
determined order after the ready light. 

After the S was comfortably seated at 
the proper distance, in a semidarkened room, 
instructions were read specifying that the 
experiment was to study individual speed 
and accuracy. Each person was told to re- 
act as quickly and accurately as possible to 
the lighting of the center light by pressing 
the button he was holding. Each S was in- 
formed that other lights on either side of 
the center light occasionally would be lighted 
but the S was not to respond to them. If an 
error were made, S was to continue to re- 
spond as quickly and accurately as possible 
to the successive lights. 

Before reading the instructions, E lighted 
the ready light to ascertain if S could see the 
light without difficulty. After S’s affirmative 
response, he was requested to fixate on the 
ready light while first Light 1, then Light 7, 
and finally both Lights 1 and 7 simultane- 
ously were lighted to determine if S could 
see the lights without difficulty. From S's 
response, it was reasonable to assume that 
all Ss used in the study were able to see all 
stimulus lights. 

First, Ss were given 25 training trials to 

the center light (#4) only. After the training 
series, without interruption, the test series 
began in which each of the three lights to 
the left and right of the center light was 
presented four times, making 24 test trials 
to lights other than the center light. The 24 
test trials were interspersed among 59 ad- 
ditional trials to the center light ( X4), in 
six different orders, each beginning with a 
different one of the six test lights. In this 
manner, each test light. appeared as the first 
test light after the training trials, an equal 
number of times. On each order, 9 old and 
10 young Ss were tested, with Ss assigned to 
the different orders of presentation in suc- 
cessive order of their availability. 

Stimulus generalization is defined opera- 
tionally in terms of the number of responses 
made to lights other than the training light 
(center light). As response to the center 
light is the required response, responses to 
other lights are considered as errors. Ac- 
cording to theory (Hovland, 1951), a gradi- 
ent of generalization was expected, X ide 
errors to the lights progressively further 
from the center light. 


FRANKLYN N. ARNHOFF 


The young Ss were 60 volunteer or prac- 
ticing nurses, A local social club for older 
persons provided 54 volunteer older Ss. T 
older volunteers from a select, nonhospital- 
ized sample of older persons probably sup- 
plied a more adequately functioning group 
of Ss than frequently is used in studies O 
this sort. The age range for the young group 
was 19-27 years, (X 204, ¢ 12), and for ro 
old group, 60-82 years (X 70.1, o 7.6). A 
the young group was female; the old zn 
consisted of 22 males and 32 females. a 
no sex differences could be demonstra ? 
on the dependent variables of errors and aS 
action time, data for the male and fema 
Ss in the old group were pooled. mE 

No consideration of differences in pcs 
cation, socioeconomic class, etc. was gV lo 
since indices of such variables apponi a 
have different meanings when applied p 
young and old (Arnhoff, 1955; Dennis, 1 ES 
Verbal intelligence does not seem to pee 
lated to performance on this task (Arm d 
and Loy, 1957) and, therefore, was not me 
ured. 


Resutts 
Analysis of Errors 


Table 1 shows the percent of response" 
made to each light by each group. ee 
for a minor reversal at Light 3, the ii 
group was in all instances less ip 
than the young. The average of the di d 
ences between comparable percent. 
(light by light) was 4.0, a value whic 
significantly different from zero at ES ple 
level of confidence (t = 3.01, 5 df). ding 
2 presents the percentage of Ss ae 
to each light at least once, and o 
fewer Ss responding in the older Efer- 
In this instance the average of the tages 
ences between comparable pero 
was 14.25, a value which is signifie n- 
different from zero at the .01 level f 
fidence (t = 4.67, 5 df). "Wu 

Comparison of the two distribution’ 1, 
error scores (Table 3) yielded x e si£ 
which, with 4 degrees of freedoms 0 of 
nifieant at better than the 0? here © 
confidence (cells were combined WP" 4 
pected values were too small), E 
further the differences in respon" "e the 
tween the two groups. Examinatio 


— 


AGE AND STIMULUS 


error distributions for the two groups, 
however, reveals all Ss in the young group 
making at least one error, while 9 of the 
old Ss made no errors at all (zero error 
Scores). The probability of obtaining such 
à difference in proportions (errors vs. er- 
Torless) by chance alone is less than .001 
(Fisher's Exact Method). 

Although the results thus far confirm 
the expectation of less responsivity (fewer 
errors and therefore by definition less 


TABLE 1 
Percentage or Responses TO ALL LIGHTS 


Lights 
Subjects | y 
1 2 3 4 5 6 7 
ss — 
Old 


54/10.6] 8.3/21.3| 100/13.9| 9.2) 9.2 
4512.8/10.0/25.6| 100,16.7/11.1]11.1 


Young 60,12.1/11.3/21.0 10018.817.915.4 


ra Note.—In the second row of data for the old group 
Sp, the percentages based on omission of the 9 zero-error 


P TABLE 2 
ERCENTAGE or Ss RESPONDING TO EACH 
Lieut AT LEAST ONCE 


Lights 
Subjects N 


Gio) Ss | & [SNe]? 
I 
Ola 
Young 


54/35. 2/25.9/51.8| 10042.6/29.6/27.8 
60/40.0/36.7/68.3] 100/51.7|55.0/46.7 


TABLE 3 
DISTRIBUTIONS or FREQUENCY AND PERCENTA! 


GENERALIZATION 261 
stimulus generalization) in the old group, 
the unexpected and significant difference 
in the number of error-free performances 
suggested the desirability of further analy- 
ses of the data to examine and contrast the 
performance of the two groups when error 
was made, thus excluding from these anal- 
yses those old Ss who made no errors. 
Since studies using similar equipment 
(Mednick, 1955; Smudski & Braun, 1956) 
have reported few if any zero error scores, 
there was the possibility that these error- 
free Ss were atypical in other ways also. 
This approach to the data seemed to offer 
a safeguard against conclusions assumed 
valid for the majority, yet which might 
be markedly and disproportionately influ- 
enced by a few, possibly atypical Ss. 

Omitting the 9 zero score Ss from the 
old group, the percentages were recom- 
puted for the distribution of error scores 
(Table 3). The x^ between the two groups 
was now nonsignificant (x° = 3.5, 3 df, 
p .05, 7.82). While the general tendency 
towards less responsiveness (lower error 
scores) continues to be shown in the old 
group, the differences now are not signifi- 
cant. The major difference between the 
two groups of Ss is apparently the number 
of zero errors in the old group. 

As can be seen from Table 1, the ob- 
tained generalization gradients are irregu- 
lar for both the old and young groups and 
are not bilaterally symetrical as would be 


GE OCCURRENCE or Error Scores 


Error Scores 
G 

"d 0 1 2 3 4 5 6 7 8 9 | 10 | 11 | 12 | 13 | 14 Total 
aama ta sails |e liigina [8 |i 
% |16.6/20.3/20.3/11.1/11.1| 5.6] 7.4] 1.9) 0 3 19 190 [9 49 

æ | — [24.504.5/13.3|13.3| 6.7, 8.9] 2-2/0 |0 | 2-2) 7- 
Re 2|0]|0]|0 |1]| 60 
% ge | d alid odd digni 12950 17] 3.3] 0 0 |o |1.7| 100 


Ni 
Pte.—An error score is the total number of responses (e 


T G 
he second row of percentages for the old group is base! 


262 


TABLE 4 
Tur MEAN AND STANDARD DEVIATION OF 
INDIVIDUAL AVERAGE Reaction TIMES 
TO THE CENTER LIGHT AND FOR ERROR 
Responses TO EacH Licut (IN 
HUNDREDTHS OF A SECOND) 


Lights 


Mean RT Old 34 | 32 | 36 | 41 | 36 | 37 | 34 
Young | 30 | 27 | 27 | 31 | 27 | 30 | 27 
Sigma RT | Old 19 | 20 | 20 | 18 | 21 | 16 | 16 


"Young 9/12) 9| 8| 8|10| 8 


theoretically expected. While these find- 
ings were unexpected, the reasons are not 
apparent from either the data or the pro- 
cedures used. Comparable irregular gra- 
dients, however, have been previously re- 
ported (Mednick, 1955). Furthermore, 
while increased responsivity at the most 
peripheral light positions has been re- 
ported as early as 1939 (Humphreys), and 
more recently (Mednick: 1955, 1958), no 
adequate explanation has been offered. 

While the differences in generalization 
between the two groups (Table 1) were 
found to be significantly different when 
based on the total group of old Ss, reanaly- 
ses of these data in terms of only those Ss 
who made errors markedly alters the mag- 
nitude of the differences (Table 1), so 
that comparison of the actual error per- 
formance of the two groups (average dif- 
ference between percentages) was not sig- 
nificant (t = 0.62, 5 df). 


Reaction Times 


Reaction times were analyzed for test 
trials to the center light, and for the other 
lights when an error was made. As reac- 
tion times are contingent upon response 
to a light, if no response was made the 
reaction time was infinite and counted as 
zero. The nine errorless old Ss were, there- 


: , i 
5 Although reaction times were measure 
in hundreths of à second, they are treated 
as scores, and the decimal point omitted in 


reporting. 


FRANKLYN N. ARNHOFF 


fore, necessarily excluded from analyses 
of error reaction time, reducing the old 
group N again to 45. They are, however, 
included in all analyses of reaction time 
to the center light as the percentage Te- 
sponse here was in all instances 100%. 
For purposes of comparison, an average 
individual reaction time was computed for 
the 59 responses to the center light, and 
an average individual reaction time for 
error responses. The individual average 
reaction time for center light and for error 
then served as basic data for the age group 
comparisons. . 
The individual old and young reaction 
times were significantly different for the 
center light trials (F, p < .05), as well 
as for the error responses (F, p < 1): 
From Table 4 it can be seen that the 
average reaction times for the old group 
were consistently higher than for the 
young for the center light (Light 4) ss 
well as for each of the peripheral light 
positions. Moreover, at each light post 
tion, the older Ss were more variable a 
average reaction times than were t 
younger Ss. As a further examination es 
the differences in the distributions of 2V° 
age reaction times for the young and van 
groups, they were dichotomized at uc 
time 40 hundreths of a second. 100% d 
the young and 5696 of the old had € 
reaction times of 40 or less on the y 
light, and by the same dichotomy, 
of the young and 71% of the old 
average reaction times for error of 
less. Both differences in proportions age 
significant (p < .05), with the ave y 
reaction times for the older Ss consiste 
longer than for the young. 


0 oF 
are 


racy 
Relationship between Speed and Accu 


. aqual 
The correlation between the individo 
total errors and the average reaction -— 
for all trials (center light plus €7TO and 
sponses) was r = —.27 for the o poth 
r = —36 for the young. While ‘aos 
instances the correlations are een are 
different from zero (p < 05); t 


AGE AND STIMULUS GENERALIZATION 


not significantly different from each other 
indieating that this relationship does not 
appear to be age related. While the nega- 
tive correlation is to be expected, ie., 
lower individual mean reaction time asso- 
ciated with higher total error scores, the 
magnitude of the relationship is small, 
Indicating that those Ss in either group 
With the fastest reaction times are not 
necessarily those who make the most er- 
me These relations, in conjunction with 
"d ser mide regarding errors and 
E ction times give support to Birren's 
ed observation that changes in reac- 
ia RA may take place (within limits) 
Uem corresponding effects upon the 
er of errors made. 


React;, : 
action Times and Zero-Error Scores 


& T * was felt that the nine ZeTO-error 
than s might be atypical in ways other 
mS heir error-free performance, their 
os Teaction times were analyzed 
of th ‘ately. All nine were above the median 
and E old group's average reaction times, 
(ie E scattered throughout this range 
* Toughout the above-median range). 
re mer of all nine scores falling 
im is median is .002 (Fisher's Exact 
action l). The mean of their average re- 
hd times was 46.3 (o 6.9) as contrasted 
309 € mean average reaction time of 
eld x 12.6) for the other 45 Ss in the 
of then: As the error-free performances 
reaction nine Ss are associated with longer 
slow ^: times, it is quite possible that 
ii eaction time was either a major de- 
eir Dant, or at least a major aspect of 
B T adii performances. In strict 
b med an error-free performance might 
therefor cred as most stimulus bound and, 
tion is deficient insofar as generaliza- 
concerned. However, the reasons 


4 
E d 
a age Ioue group was too homogeneous 
and Spe O obtain relationships between age, 
thoy, En and accuracy. For the old group, 
Age int the number of cases at successive 
Seryeg Vals was small, no trend was ob- 


263 


for their errorless performances of the 
nine old Ss are not evident from the data, 
with the possibility to consider, among 
others, that rather than a deficiency in 
generalization, the performance of these 
Ss may represent a basic difference in at- 
titude or task orientation which results 
in their slowing down to maintain ac- 
curacy. 

The slower response latencies of the 
older Ss is consistent with other findings 
(Welford, 1958) which show small but 
significant and consistent slowing in the 
response latencies of older persons. In 
view of the magnitude of the over-all re- 
lationship between speed and accuracy it 
is indeed difficult to generalize that slow- 
ing of speed compensates in older Ss to 
maintain accuracy. As shown previously 
(Welford, 1958), speed of reaction can be 
considered as a sum of two parts: motor 
speed and decision speed. Current opinion 
and evidence (Welford, 1958) suggest that 
central factors influencing decision speed 
are the more important. However, while 
neurological concepts are frequently in- 
voked to explain age differences, it is 
likely that many of the observed phe- 
nomena are due to attitudinal differences 
towards experimental tasks and instruc- 
tions, not only between divergent age 
groups but also within samples of persons 
homogeneous with respect to age. As in 
the present study, the errorless perform- 
ance of the 9 old Ss may as well be due to 
cious or unconscious attitudinal dif- 


consi 
both, as they are 


ferences or, indeed, to 


to neurological factors. . 
Although the results showing greater 


intersubject variability with increased age 
are consistent with previously reported 
work (Welford, 1958), it should be Te- 
called that the time between the lighting 
of the ready light and the presentation of 
a stimulus was randomly varied between 
3 to 5 sec. to reduce anticipatory responses. 
However, the shorter the interval (fore- 
period) between ready signal and stimu- 
lus, the longer the reaction time (Gibson, 


264 


1941; Mowrer, 1940), with differential ef- 
fects due to age (Botwinick, Brinley, & 
Birren, 1957); that is, the shorter time 
span slows the subsequent reaction time 
of older Ss more than younger Ss. Conse- 
quently, the variable foreperiod in this 
study should result in greater variability 
of subsequent response latency in the old 
as contrasted to the young. Secondly, the 
younger group was much more homogene- 
ous in age which may also result in less 
intersubject variability in this group. It 
is probable that some of the increased 
variability in the performance of older Ss 
in this and other studies may be related 
to methodological and sampling differen- 
tials. 


SUMMARY 


Fifty-four aged Ss and 60 young Ss 
were examined on a visual-spatial task of 
stimulus generalization. The specific task 
was chosen because of its apparent lack 
of complexity, memory involvement, and 
relation to past experience—factors felt to 
handicap and lower the performance of 
aged persons. Data were analyzed on the 
basis of error performance and reaction 
time, as generalization is defined in terms 
of the number and position of errors. 

Response latency was found to be longer 
for the older Ss, consistent with previous 
findings. Significant differences in generali- 
zation (number and position of errors) 
was found between the two groups, with 
less generalization in the old. However, 
nine old Ss made no errors at all in con- 
trast to the fact that an error-free per- 
formance never occurred in the young 

group. To guard against disproportionate 
influence of errorless old Ss, data were 
reanalyzed, excluding these nine from the 
analyses. The differences in generalization 
between the two groups was now not sig- 
nificant. The average reaction times of 
these nine Ss were also found to be above 
the median for the old group. 

A low, negative correlation between ac- 


curacy and reaction time was found, for 


FRANKLYN N. ARNHOFF 


both groups, with the lack of statistical 
significance between the two correlation 
coefficients indicating an apparent lack of 
age relatedness. 


REFERENCES 


Arnuorr, F. N, & Loy, D. L. Relationship 
between two measures of stimulus gen- 
eralization: Influence of intellige 
upon performance. Psychol. Rep, 1997, 
3, 465-470. i 

Arnuorr, F. N. Research problems in geron- 
tology. J. Geront., 1955, 10, 452-456. 

Bropgav, E. A. Brown, J. S., & MERYMAN, 
J. J. The summation of generalized te 
active tendencies. J. exp. Psychol., 1999, 
51, 293-298. M 

Biren, J. E. Age changes in speed of simp. fs 
responses and perception and their Old 
nificance for complex behavior. In ihe 
age in the modern world: Report of E 
Third Congress of the Int. Assoc. A 
Geron. London: Livingstone, 1955. ^P' 
235-247. 

Borwinick, J., Bumrgy, J. F, & Ba 
J. E. Set in relation to age. J. Ger? 
1957, 12, 300-305. 

Brown, J. S., Bironzau, E. A, & Basons 
M. R. Bidirectional gradients 1? T 
Strength of a generalized voluntary tal 
sponse to stimuli on a visuals? 
dimension. J. exp. Psychol. 1951, 
52-62. of 

Dennis, W. Age and behavior: A m. * 
the literature. USAF Sch. Aviat. "005: 
proj: Rep., 1953, Proj. No. 21-020 

ep. No. 1. ch 

Fisher, R. A. Statistical methods for ni 
workers. (11th ed) New York: 
1950. 


D^ 
Grasox, J. J. A critical review of the Sarj- 
cept of set in contemporary i, 194b 
mental psychology. Psychol. Bulls 
38, 781-817. 20 and E 
Hovianp, C. I. Human learning Jand- 
tention. In S. S. Stevens (Ed2» ew 
book of experimental Psychology: 
York: Wiley, 1951. Pp. 613-689. a 
HuwPmmgvs, L. G. Generalizatio? ^, J. 
function of method of reinforce 


ezp. Psychol., 1939, 25, 361-372. dif- 
Koncmm, S. J, & Basowmz, H: ^ar 

ferences in verbal learning. J- 

soc. Psychol., 1957, 54, 64-69 cy) 


Í tani 
Mownzn, O. H. Preparatory set (expos 
—Some methods of measurer ole 
chol. Monogr., 1940, 52(2, 
233). 


——áH M — EMI Ü— 8 — M 


AGE AND STIMULUS GENERALIZATION 265 


Mepnick, S. A. Distortions in the gradient of 
stimulus generalization related to corti- 
cal brain damage and schizophrenia. J. 
le soc. Psychol. 1955, 51, 536- 

Mepnicx, S. A. Stimulus generalization as a 
function of level of achievement im- 
agery. Psychol. Rep., 1958, 4, 651-654. 

Rucu, F. L. The differentiative effects of age 
upon human learning. J. gen. Psychol. 
1934, 11, 261-268. 

Saupsxr, J. F., & Braun, H. W. Changes in 
the gradient of visual stimulus generali- 


zation as a function of age. Amer. Psy- 
sologist, 1956, 11, 374. (Abstract) 

Watson, R. I. The personality of the aged. 
A review. J. Geront., 1954, 9, 309-315. 

Werner, J. Personal and social adjustment. 
In J. E. Anderson (Ed.), Psychological 
aspects of aging. Washington, D. C.: 
American Psychological Sehon 
1956. Pp. 17-20. 

Wzrronp, A. T. Aging and human skill. ox 
ford: Oxford Univer. Press, 1958. 


(Received February 9, 1959) 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 6, 1959 


SCHOLASTIC BEHAVIOR AND ORIENTATION 
TO COLLEGE 


ROBERT C. BIRNEY 4x» MARC J. TAYLOR? 
Amherst College 


Soon after a student arrives at college 
two broad realms of activity begin to 
modify his behavior. On the one hand, he 
is required by the college to attend classes, 
study, prepare for examinations, etc. On 
the other hand, much of his time is spent 
with his peers whose norms regarding so- 
cializing, dating, ete. have a demand char- 
acter. The particular effect of these two 
main types of activity will depend in part 
on the nature of the reinforcements which 
he finds within them. The curriculum of- 
fers grades, faculty recognition, prestige, 
and a sense of mastery of a certain body 
of material The student culture offers 
prestige for social leadership, peer recog- 
nition, and a sense of escape from the cur- 
ticular demands. Presumably, the student 
attempts to maximize his rewards and 
minimize his penalities, presumably by 
developing a mode of operation during his 
freshman year which, if moderately suc- 
cessful, probably becomes quite stable by 
the senior year. 

This study was designed to permit iden- 
tification of the various behavioral and at- 
titudinal patterns reported by college 
seniors. The attitudinal pattern was con- 
ceived as an "orientation to college" to be 
measured by a specially constructed in- 
ventory of attitude items called the Orien- 
tation to College Inventory (OTCI). We 
have assumed that the scholastic (SCH) 
and social (SOC) orientations are meas- 
urably distinet. Thus, a student with a 
positive orientation toward social affairs 
(SOC) would seek most of his reinforce- 

ment within the realm of student social 
life, while the student with a positive ori- 
entation toward scholastic affairs (SCH) 
will seek his major rewards in curricular 
pursuits. Obviously, these two orienta- 


? Now at Columbia Medical College. 


tions interact, and it is expected that there 
are behavioral correlates of these patterns 
of interaction. The college behaviors under 
study were those of coping with the oe 
riculum and choosing a career. T 
tion was obtained by interviewing studen j 
at some length about the expectations they 
had held toward courses, the considera- 
tions generally adopted in p 
courses, and the reasons for taking ible 
courses they did. Of course, it is im possi ei 
in this type of study to establish. a A 
causal relationships between the ee 
and behavioral measures. Rather we ™ 
be content to establish the nature of e- 
patterns which exist, with an eye io ab 
velopmental studies later. Finally, the up- 
titudinal and behavioral reports were $ 4 
plemented with data gathered from ab 
“public” records the students had E 
lished in extracurricular participa" 
counseling, and academic achievem® a. 

The hypotheses tested in this e 
tion involved the relationship bet "ET 
our attitude measure, the Orientatio Ps 
College Inventory (OTCI), and a 
ported behaviors. They are set forth an 
in table form after the measures i jn 
groups involved have been deser! erg d 
greater detail. These hypotheses pe u- 
from the observations of the j otw 
thor regarding the differences eating 
students in these behaviors. In t" tivity 
the material on extracurricular rice? 
and use of the college counseling Se? ished 
hypotheses were stated. We simply Favio" 
to compare the major attitude- 
pattern groups on these data. 


PROCEDURE e 


jt 
. study * 
The major results of this e d 
based on the data from a eic pve! 
of 57 Amherst College sen!o 


266 


— = 


Bc 


ee i 


ORIENTATION TO COLLEGE 


fourth senior, who had been in continuous 
residence at the college since freshman 
year, was selected from the Student Di- 
Tectory. Data were obtained from all 
Seniors selected except one who left school 
while the study was in progress. Also 63 
freshmen were randomly selected for an 
administration of the OTCI. Twenty-four 
of these Ss volunteered to take the OTCI 
for a second time two months later. 

à The seniors were interviewed at the be- 
ginning of the second semester over a 
five-week period. All interviews were con- 
ducted by the junior author, 41 being 
"imn in the Ss fraternity house, and 
d remainder taking place in the labora- 
ep Interview time ran from one hour 
: two and one-half hours. The Ss re- 
E were recorded in code on response 
: ets. Many interview items were pre- 
Sented on cards to the Ss. 


Interview 


he full interview schedule runs to 

E ve double-spaced pages and may be 
tained by writing the senior author. 

i he general interview procedure is best 
Ustrated by the following example: 


Ns purpose of this interview is to find 
fies Amherst men choose courses. Any- 
Fee you say, of course, will be kept in con- 

numbe, AS you see I have assigned a code 
è aoe to you. What you will say will not 

Chien ee. associated with your name. (J 

a re S code number on data sheet.) Here is 

cord of all the courses you have taken 
one jr Are there any errors or omis- 

Eo ba, ie your senior schedule in order? Let’s 

Ourse, to the time when you were choosing 

that ti, Try to imagine how you felt at 

Wtion Me (E selects a course that S took 
hat T year and gives a brief explanation of 

tell m2 meant.) I am going to ask you to 

ings ap Some of your expectations and feel- 
leat iue these courses. (E hands S a loose 

Cards inder which contains 5" by 8" index 
titten) which the rating scales are type- 


The first card is: 


T 
inp at extent did you think it would be 
e rtant to your general educational de- 
Opment? 


267 


1 2 $9 & 0 6 4 8 
I consid- I thought I didn’t 
ered it ex- it would think it 
tremely be of mod- would 
important erate im- have much 
to my gen- portance to do with 
eral edu- to my gen- my gen- 
cation eral educa- eral edu- 
tion. cation. 


All of the course data were obtained using 
this interview technique. Some items deal- 
ing with Career Plans and the Major 
Field were informational, with a few being 
open-ended. In every instance the inter- 
viewer presented the item orally as well 
as visually, and clarification via stand- 
ardized interviewer replies was possible. 
At the conclusion of the interview the S 
was given the OTCI to be filled out and 
returned. 

Following the graduation of the Ss, 
records were obtained of extracurricular 
participation, contact with the college 
counselor, CEEB verbal scores and grades. 
At the end of the analysis of these data 
the senior author discussed the various S 
groupings with those college officials most 
likely to have impressions of the students. 


The Orientation to College Inventory 
(OTCI) 

Having postulated the attitudinal con- 
struct of orientation to college, we needed 
an instrument from which a student's 
orientation could be inferred. No existing 
measure seemed appropriate. This instru- 
ment, in addition to meeting ordinary 
psychometric requirements, had to be ac- 
ceptable to Ss in the Amherst College (all 
male) population. Accordingly items were 
written with this population in mind and 
groups of students were asked to cooperate 
in the phrasing of the items. The final 
form consists of 60 Likert-type items taken 
from the original pool of 150 items. There 
are 19 SOC items, 20 SCH items, and 21 


filler items. 
Item analysis of t 

the seniors was performed 

chi square technique in w: 


he OTCI responses of 
using a simple 
hich the dis- 


268 


criminability of each item was tested 
against the sum of all others. First to be 
checked were five SCH items concerned 
with “grade-getting” since it was felt these 
might not correlate with the basic scho- 
lastic orientation of the other SCH items. 
They did not and were dropped. Eight 
additional items failed to relate to their 
respective dimensions and were eliminated 
from the final scores. 

The scores used in this study are based 
on 14 SOC items and 12 SCH items, ex- 
amples of which follow: 


Social items. 


1. It is important to make a lot of 
friends at college. (A) 


2. I believe that having a good time and 
getting a full share of fun out of col- 
lege life is as important as any other 


aspect of your experience there. (A) 
Scholastic items 


1. I enjoy studying. (A) 
2. I wish that I could take more courses 
than the college allows. (A) 
Until cross-validation of these items on 
a larger senior sample is carried out, pres- 
ent findings must be interpreted with this 
limitation in mind. 

The seniors showed Spearman-Brown 
coefficients of +.81 for the SCH and +.74 
for the SOC, while the freshmen values 
were +.46 and +.40 respectively. The 
product-moment correlation between SCH 
and SOC scores was —.41 for the seniors 
and —.34 for the freshmen. Clearly, the 
freshmen gave much less consistent pat- 
terns of scores. 


Subject Groupings 


In taking the OTCI the senior sample 
gave four sets of responses, one for each 
of the college years. Correlations between 
successive years indicate little variability 
in these remembrances, though the fresh- 
man year scores do show the least rela- 
tionship to the senior year scores. The 
SCH score for senior year yields a product- 
moment r with junior year of +87, with 
sophomore year of +.67, and with fresh- 
man year of +.61. The SOC scores shows 


ROBERT C. BIRNEY AND MARC J. TAYLOR 


similar relationships. Senior-junior is +.88, 
senior-sophomore is +.86, and senior- 
freshman is +.66. 

In the present analysis the Ss score NE 
the mean of the “junior” and “senior 
responses. The two distributions were split 
at the medians. Five Ss whose scores com- 
cided with the SOC median were elimi- 
nated from subsequent analysis. On the 
basis of the two scores each S was e En 
to one of four groups, High SCH-His 
SOC (A), High SCH-Low SOC (B), Low 
SCH-High SOC (C), and Low part. 
SOC (D). These Ns are 10, 16, 15, and 
respectively. 

The qoom tested in this study, 95 


5 : f 
they relate to the predicted behavior O', 


in 
these several groups, are presented 


Table 1. 


RESULTS 


Correlates of the OTCI 


Available for all Ss were the parvi 
Entranee Examination Board E 
Scores, senior year scholastic average eee 
cumulative averages through the firs 
mester senior year. Table 2 presents ores 
product-moment r's for the OTCI s des 
and these grade measures, and i crt 
the relationship between the two A us 
themselves. A previous study has a 
to expect the CEEB verbal to eee o 
the cumulative average on the OT dorse” 
+.60. It appears that for seniors tively 
ment of the “scholastic” items is wm 
related to general verbal ability and shi 
academic performance. The age? 
of the SCH score to cumulative 2" 99. 
with verbal ability partialed out 15 q for 
These relationships were not fou” 
the freshmen. 


Comparison with Freshmen 


the 

The fact that the seniors complete oy 

OTCI for each year of college 4 A 

remembered it invites comparisoP, —. and 

the memory of freshmen orientati clas 
those obtained from the fresh™ 


| 


ORIENTATION TO COLLEGE 269 


X TABLE 1 
YPOTHESIZED RELATIONSHIPS BETWEEN OTCI SCORES AND INTERVIEW ITEMS 


Expected Group Difference Hypothesis Interview Items 
A ee SCH) greater Expected their courses to be more: 
n C + D (Low SCH) 1 1 important to their general education 

2 2 difficult compared to all the courses they 
could have taken 

3 3 enjoyable 

4 4 pertinent to their after college career plans 

Rate more important in choosing courses: 

5 1 the extent to which the course would con- 
tribute to (their) general education 

6 2 the pertinence of the courses to (their) 
future career plans 

Reasons given for choosing courses will show: 

y 1 a greater proportion of intrinsic reasons, 
e.g., “. . . really interested in it” 

8 2 a greater proportion of instrumental rea- 
sons, e.g., '...need it for graduate 
school" 

9 3 a lesser proportion of evasive reasons, e.g., 
«it met at a convenient time” 

Rr D (low SCH) greater Rate as more important in choosing courses: 
anA +B 10 1 the difficulty of the type of material for you 

11 2 the grades that you thought you could 


obtain in the courses 


—, 


C (Low SCH-High SOC) 
Sreater than A, B, D 


Expect their courses to have: 
1 fewer hours of outside class work required 
2 to do fewer hours of outside class work 
Rate more important in choosing courses: 
1 The number of hours of work (they) thought 
they would have to do 


(High SCH-Low SOC) 
Érenter than A, C,D 


a 


Expect their courses to have: 
1 more hours of outside class work required 
2 to do more hours of outside class work 
Rate as unimportant in choosing courses: 
1 the number of hours of work they thought 
they would have to do 


T 

d freshmen scores show significantly 
than variability (nearly twice as much) 
both a seniors’ “freshmen” scores, on 
Close aera The SOC means are quite 
Mean together, but the freshman SCH 
flo; is five points higher, an untestable 

nce, The lower freshmen sample re- 


di 
liah, 


the; 


I 
i Sii greater spread of score, and 
Ina uces in mean suggest that freshmen 
i: * considerably more confused about 
niong entation to college than are 


TABLE 2 
CORRELATES OF THE OTCI SCORES FOR 
SENIORS AND FRESHMEN 


Seniors (N = 57) | Freshmen (N = 63) 


MUERE j 
Variable z 
lative| CEEB | Cumulative 
Gebel Nara Verbal | Average 
SCH Score | +-37* +.45* 4.10 +.08 
SOC Score —.2 —.20 —.05 UE 
CEEB T.97* 4.58 
Verbal 
* P «0. 


270 


Analysis of the Interview Data 


The hypotheses were tested by use of 
the Mann-Whitney U test (Siegal, 1956). 
This required that the 52 Ss be placed in 
rank order for each item to be tested. 
Medians were obtained for each group, 
and the extent to which these medians 
depart by chance from each other in the 
predicted direction ascertained. Ranking 
was done by using the scale position the S 
endorsed, the number of hours he re- 
ported, or a classified proportion of rea- 
sons selected from a total number of rea- 
sons chosen. The exploratory analysis of 
informational items was done with the 
chi-square technique. 


Test of Hypotheses 


Table 3 summarizes the findings for the 
hypotheses summarized in Table 1. Of 
the 17 predicted relations only four fail 
to appear. These will be discussed subse- 
quently. 

Information was also gathered about the 
details of course selection practices. Table 


TABLE 3 
TESTS or SIGNIFICANCE ror HYPOTHESES 
IN Fig. 1 
Expected Grou, Hypo- 
Mee P tem |x^|H' | 2 
A + B (High SCH) 1 18.4 | .001 
greater than C + D 2 3.9 | .05 
(Low SCH) 3 7.7 | 0 
4 13.3 | .001 
5 7.8 .01 
6 2.79 ns 
7 28.0 | .001 
8 3.3 | ns 
C + D (Low SCH) 9 9.33 .01 
greater than A + B| 10 | 11.4 -001 
(High SCH) ll 33.4 | .001 
C (Low SCH-High SOC), | 12 He ns 
13 . ns 
greater than A, B, D + T zs 
B (High SCH-Low SOC) m x E 
greater than A, C, D i i E 


a Median test (Siegal, 1956). : 
b Kruskal-Wallis (Siegal, 1956). 


ROBERT C. BIRNEY AND MARC J. TAYLOR 


4 contains the questions, parameter data 
and group difference levels of confidence 
for this portion of the interview schedule. 

Five questions were asked concerning 
the major program. The groups do not 
differ in major field, time of major deci- 
sion, majors considered, or change in maê- 
jor. The High SCH Ss report significantly 
greater confidence that they would choose 
the same major again. 

Fifteen questions were asked about ¢a- 
reer choices and eurrent plans. Here only 
the SCH dimension (Groups A and B) 
showed any power of differentiation. 
Eighty-one per cent of Ss with High 8C 
Scores intended to pursue professional ^ 
opposed to business careers, while only 
42% of the Lows reported profession? 
aspirations. Since very few Lows up a 
to attend graduate business schools ine 
more Highs would receive graduate oe 
ing than Lows. This presumably ap 
the ability and cumulative average Tif 
tions presented. When asked how he 
changes of plans had taken place !! ort 
last six months, the High SCH Ss TP 
significantly fewer than the Lows. d how 

Finally, the interviewees were aske for 
much time they had spent studying the 
each course. These inquiries coveret r 
courses taken during the junior year * 
the first semester of the senior ye" 
an effort to learn more about our ' 
we turned to the record of “publie ys 
which they had left behind. Th 
done mainly to learn more about some” 
D, although we expected to discover naly* 
thing about the others as well. An 2 ex 
sis was done of the college records © 10 
tracurricular participation S° é dif- 
examine the number and nature 
ferent activities entered during t of the 
years. Table 5 shows a comparison. erent 
groups for study time, number © 
activities over four years, and d 
of the activities. så ap 

It is clear the High SCH aoe whit 
B had a record of wide particiDa’gsiet™ 
the Low SCH groups had ? 


oo 


| 


ORIENTATION TO COLLEGE 


271 


g TABLE 4 
ROUP DIFFERENCES IN COURSE SELECTION PRACTICES 
Question 
z Range Median Group Difference! P values 
' many days did you take 1 to 30 day: 
to Fae A e ays 3 days A B (High SCH) more <.01 
à ected? (From the time ae 
you picked up the course 
prese oe until the 
e you handed i - 
em, ed in the pro 
Ow many hours did y 
Spend talki vi ‘ol. 
losing? ing with the fol- 
fri 
: fore i to me 3 hours none ed 
ae you estimate the per- J "a n duas TM 
- nt of your knowledge of 
pores that comes from: 
er students 10% to 98% 25 
A 5% C,D 
D catalogue 1% to 90% 245; n = 
um y (by asking them) 0% to 75% 5% A,B > 0,D <.01 
Baye own observations 0% to 75% 2496 none ' = 
sia experienced any Yes No 
Feng M choosing 81% 10% A,B,D > C «4001 
ans which involved 
caa ing to take more 
hane than you could? 
tou à choice among Yes No 
$ :dn? 
gut to Boyd you didn’t 40% 60% AGD > B um 
ver sought to balance Yes No 
h Wi eigen a in any way? 71% 29% 9D > A,B bas 
at wia extent has the hour Some None 
h ich the courses met 37% 63% O5 ABD AO 


inf 
uenced your choices? 


1 
All P 
E] eine a obtained from chi square analysis of 4 by 2 tables. 
; parent, and others elicited a response from only 3 Ss. 


Toc, 
iie ie to the nature of this par- 
8toups: 2 We divided them into four 
Banizatio ervice organizations, Talent or- 

Rite ms, Interest groups, and Sports? 
Which He that the nature of the activity 

he High SCH's displayed is spread 


2 
Servic, z 

Bi od p organizations were: Chest Drive, 
i mi) Tutorial System, Mardi Gras 
d the p Committees, Student Committee 
dO mitt, aculty, and House Management 
ie Bat Talent groups were: Band, ra- 
qe Ds ee yearbook, glee club, singing 
eu fie s: eatre, literary magazine, and stu- 
ad Ohne ners Interest groups were: Law 
dSSoeint; ing club, Philosophy club, Christian 
efi on, and Sailing club. Sports was 
as participation in a varsity sport. 


and Talent organiza- 


over both Service 
only be described 


tions, while Group D can 
as disinterested in all but Sports. How- 
ever, it must be borne in mind that the 
Sports figure is based on participation in 


at least one varsity sport, and does not 
imply a sports “career.” The fact is there 


were only 10 varsity letter winners in the 
entire sample, and they were evenly dis- 
tributed across all groups. Furthermore, 
the Group D record for Sports is not dif- 


ferent from the other groups. 
To summarize this analysis of the public 


records of the students, it appears that 
the High SCH Ss were more apt to per- 


272 


ROBERT C. BIRNEY AND MARC J. TAYLOR 


TABLE 5 


COMPARISON or Groups ON Mean Srupy Time, NUMBER OF 
ACTIVITIES, AND TYPE or ACTIVITY 


A 
Gu G5 GAS Gm di 
Misnstudy Gime 20.9 — 28.8 16.9 18.3 05 
(EAS (P is 2.86, 3 df) 
Number of activ- 0-2 1 1 7 9 
ities dc 5 8 5 2 
5-10 4 7 3 0 001 
Percentage taking Service 40 69 33 27 02 D) 
part in one or (B vs. A.C; 
more groups. Talent 60 75 33 27 -01 
2s (A-B vs. C-D) 
Interest — 30 44 60 09 05 
(C vs. A,B,D) 
Sports 70 56 73 82 30 
(A-B vs. c-D) 
Chi square based on 3 X 2 table (A-B vs. C-D) is 17.24. 
n . H le 
form public services, and exercise natural Of the two dimensions the SCH e 
talent, while we know little more about displays the most relationships. This jew 
Group D, the Low-Lows than we did at not wholly unexpected since the er 
the outset. However, it seems safe to say items are confined to scholastic peha ons 
that they were not using the usual campus Presumably the introduction of aves o 
organizations as sources of reinforcement. concerning the social life and practic? th 
Even more interesting perhaps is the fact the students would be required, for ips 
that Group [è] did not seem to be satisfy- SOC dimension to generate relations e 
ing their SOC interests in campus-wide The two hypotheses which predicte? ing 
organizations. Presumably the fraternities High SCH Ss would report e^ to 
served this purpose. courses because they were instrumet “col 
were 00 
Ditsecsioae career or graduate work are ope 
firmed. Apparently all group? © oration? 
It appears that the response patterns on choose courses with career consti m dif- 


the OTCI have systematic relationships 
to the answers given to the interview ques- 
tions concerning curricular behavior and 
concerns. Of course, it has not been es- 
tablished that the Ss did behave as they 
remember behaving, or have the expecta- 
tions they now recall. But certainly the 
picture presented has a considerable 
amount of coherence, and if we are deal- 
ing here with the phenomenon known as 
“response set” these sets themselves show 
coherence. Furthermore some of the ques- 
tions dealt with career plans, and the 
choices described seem consistent with the 
background information the Ss have given. 


in mind, since the other career i erence 
ferences probably reflect ability ©” ate 


rather than differences in T€" 400 
t that 2 


These relationships sugges jm 
orientation in college may Cal ard 
plications for the Ss orientatO?, gines 
his work in graduate school OT — ¢sine® 
The other two hypotheses n ool 

i ou pe 
predicted that Group C W zpet c d 


rate as important but also © 3 
courses to demand less outside GrouP 
expect to do less outside WOT*^, o4 
and D are similar to GrouP “g jmP 
items. Since Group C did 9 onde 
tant the amount of work 9? 


ORIENTATION TO COLLEGE 


S course it may be that the distinction 
etween amount of work and outside work 
n too refined for all but Group B who 
Er expect more outside work, and con- 
Sidered the amount of work unimportant 
m choosing courses. 
fe pote a have emerged are 
E s with High SCH scores more often 
E. rt they expected their courses to be 
Sen to their general education, dif- 
ed papd to all the courses they 
i tha ave taken, enjoyable, and pertinent 
E eir career plans. They rate as impor- 
the considerations for choosing courses, 
bee ar i of the course to their 
cele uiia the pertinence of the 
rag their career plans, and as unim- 
rial the difficulty of the course mate- 
S grs the grades they thought they 
"e obtain. They give more intrinsic 
the col evasive reasons for choosing 
in pm they did. In choosing courses 
ing m report spending more time, plac- 
Eu. reliance on students, being less 
E cy over "balance" in their sched- 
uty ; plaeing more reliance on the fac- 
thess E course information. Far more of 
intend : are going to graduate school, and 
lins. o enter professional as opposed to 
tend i. careers, (Of the eight Ss who in- 
© enter teaching, seven had High 
the g aie The positive correlation of 
verbal ; with college average and CEEB 
ability indicates that these men have the 
ve Y to do the work, and do it. It is 
telato Testing that the pattern of cor- 
tame di for freshmen, while reflecting the 
Si if irection of relationship shows no 
wor correlation between SCH scores 
‘ther ability or freshman year aver- 
in si viously this may imply differences 
liabi ple, year in college, or a lack of re- 
ility į cut IH 
Sugpos in the OTCI. Another possibility, 
freq sted by the comparison of senior and 
lege ia Scores is that orientation to col- 
Derien Something which emerges with ex- 
8 ea as the individual discovers those 
Able, Of reinforcement which are avail- 
9 à person with his ability and talent. 


278 


The groups patterns of the SCH-SOC 
scores show some signs of usefulness. In 
general the High SCH-High SOC Group A, 
seem most influenced by their SCH orien- 
tations, but the way in which the High 
SCH-Low SOC Group B departs from 
Group A is interesting. Group B reports 
expectations of more hours of outside 
class work, intention to do more outside 
work, and rate as unimportant the num- 
ber of hours that they thought they would 
have to work, They spent more time with 
the faculty choosing courses, and had the 
least avoidance conflict in making choices. 
Furthermore, this group reported actually 
doing more outside study than the other 
three groups. Note that Group A did not 
join Group B in these particulars. Perhaps 
Group A’s High SOC orientation blocked 
them from this kind of total committment 
to curricular concerns. 

The Low SCH-High SOC Group C, dis- 
tinguished themselves by rating as im- 
portant the number of hours of outside 
work they thought they would have to do, 
by giving more consideration to the meet- 
ing time of a course, by reporting far less 
approach conflict in choosing courses, and 
although their reported study time was 
not significantly less than that of Groups 
A and D, their average of 16 hours per 
week seems, in some absolute sense quite 
low. 

Group D, the Low SCH-Low SOC 
group was not uniquely distinguished on 
any of the criterion items. Of these Ss, 
75% are below the college median for the 
CEEB verbal scores and this may be the 
key to understanding their position. Pre- 
sumably they have to work for what they 
get, and this tends to limit their committ- 
ment to the social orientation, while to 
claim a scholastic orientation would be 
patently unrealistic. 

A check of the records of the college 
counselor showed that Group D was not 
distinguished from the other groups 
number of counseling interviews for non- 
vocational purposes. Discussion with col- 


274 


lege officials also failed to reveal any dis- 
tinguishing characteristics of Group D, 
unless it was the fact that these men were 
simply not well known. This was probably 
the most striking fact. As a group, the Ss 
with the Low SCH-Low SOC scores on 
the OTCI seem to have remained for four 
years “on the sidelines” of college life. 
Needless to say this type of finding con- 
tinues to suggest additional hypotheses 
for future test. 

We feel this study has accomplished its 
central purpose of establishing the nature 
of some of the various attitudinal-behav- 
ioral patterns which distinguish between 
college seniors. It appears that the chief 
variables at work are those of ability, 
talent, orientation to college, and rein- 
forcement patterns provided by the col- 
lege. At the end of four years these vari- 
ables seem to constitute stable patterns of 
behavior having serious implications for 
an estimate of the educational experience 
of the student. At the moment we know 
nothing of the pattern of these variables 
early in the freshman year, or of the 
changes which occur by junior year. Since 
many of our interview items were of a 
reminiscence nature it will be necessary to 
compare them with the self-reported be- 
havior of students at these stages in their 
college experience. 

The present study Suggests some hy- 
potheses for these studies. Students who 
are most likely to display an extreme soc 
orientation at the end of four years may 
be the middle ability group rather than 
the low ability group. Apparently these Ss 
are more apt to have scholastic aspirations 


ROBERT C. BIRNEY AND MARC J. TAYLOR 


which are destined to be frustrated. An- 
other emergent hypothesis is that the 
nonscholastically oriented student may 
have a greater proportion of his concerns 
outside of college altogether. Finally, it 5 
obvious that an intensive study of a 
social practices of student life is needed. 


SUMMARY 


Fifty-seven Amherst College seni s 
were interviewed regarding past curricu i 
expectations and practices. It was ley i 
that their eurrieular behavior woul ait 
systematically related to a two din 
sional conception of attitudes toward po 
lastic and social areas of activity. 
item Likert-type questionnaire 
with college life was develope 
yielded a Scholastic score and ches 
score for each S. Seventeen ts 
were tested and 13 confirmed. Explor 
relationships between scores and Ss artici- 
ability, grade point average, and P deter- 
pation in college groups were also 
mined. ns 

Four coherent behavior pe 
emerged when the scholastic an d into 
scores were combined and divide Lo 
High SCH-High SOC, High § 4 Low 
SOC, Low SCH-High SOC, 9^. hese 
SCH-Low SOC. Some implications 9 
patterns are discussed. 


dealing 
d w ic 
a Social 
es 


REFERENCE * he 
or th 


Stecer, S. Nonparametric statistics 5 Me 
behavioral sciences. New 
Graw-Hill, 1956. 


(Received April 28, 1959) 


a 
A ty 
hich, 
ry 
bal 


e 
E 


| 
| 


JounwAL or Ep 
iN EDUCATIONAL PSYCHOLOGY 
Vol. 80, No. 6, 1959 


DSS 


THE RELATIONSHIP BETWEEN SALARY POLICIES 
AND TEACHER MORALE! 


CLAUDE MATHIS 
School of Education, Northwestern University 


i The topic of morale has long been of in- 
crest to psychologists. According to Haire 
oo probably no other field in the so- 
E. Pero of industry can match the 
ce er of publications accounted for by 
M ies of morale. Despite the generosity 
eurn morale still remains a variable 
is v is difficult to define, although there 
doe, doubt that the phenomena is real and 
io eese for variations in behavior. 
b. eld of education has shown much in- 
Sw HH morale as it relates to job satis- 
$ ion in teaching, but this interest has 
esulted more in the voicing of opinions 


t N 
han in attempts to research the problem. 


adequately, 
ally in no other area of education 
ais opinions about teacher morale been 
CE genu and as diverse as those con- 
Sri merit rating as a method to be 
in determining salary level. An abun- 
a of opinion can be found in the edu- 
P ional literature today both for and 
Bainst the desirability of pay differentials 
fien upon professional evaluation of serv- 
6 tendered, with those individuals who 
Dpose ch evaluation, or merit rating, 
rad out that teacher morale should 
as er as a result of procedure. Such 
Euments are based on the assumption 
ke. merit rating is a psychologically dan- 
Filiae process since it invites an invidious 
of nparison, Also, studies in the psychology 
vi gr demens demonstrate that when indi- 
i uals are judging other individuals there 
iqgremendous room for error and distor- 
n in the perceptions involved. On the 
win he research reported here was done 
fessor he cooperation of B. J. Chandler, Pro- 
ity of Education, Northwestern Univer- 
from Du the support of a research grant 
nive e Graduate School of Northwestern 

rsity. 


other hand, those who favor merit rating 
emphasize that it should increase the mo- 
rale of the more competent individual, 
thus making him more productive, since 
his needs for achievement will be met more 
adequately by the recognition granted for 
outstanding service. The psychological ef- 
ficiency of the total educational effort 
should be inereased because merit rating 
would tend to discourage the less compe- 
tent, thus helping to separate them from 
a profession for which they are ill suited. 
The importance to the educational psy- 
chologist of the teacher's morale is appar- 
ent, especially if morale is thought of as 
a psychological attribute which can be pro- 
foundly influenced by external factors such 
as the manner in which the teacher is paid 
for the services he performs. 


PROBLEM 


The specific purposes of this investiga- 
tion were to design and test an attitude in- 
ventory for measuring teacher morale, and 
to determine if teacher morale is signifi- 
cantly related to salary policy in a small 
sample of school systems. À number of 
writers and organizations argue that high 
teacher morale and merit salary schedules 
are, to a considerable degree, mutually ex- 
clusive. If this argument is valid, one 
should find low teacher morale in school 
that use merit pay schedules. 
High, medium, or low morale might be 
found in school systems that use seniority 
type or single salary schedules. As a result 
of these opinions, the assumption was 
made in this study that a statistically de- 
monstrable relationship exists between 
teacher morale in & school system and the 
use or nonuse of a merit type salary sched- 


systems 


275 


276 


ule? The assumption was also made that 
morale has identifiable behavioral dimen- 
sions that can be measured through the 
use of an adequately constructed instru- 
ment which would reflect a general consen- 
sus as to the meaning which the term 
“morale” has for education. From these as- 
sumptions the following hypotheses were 
derived for testing: 

1. Morale differences exist between 
schools which include superior perform- 
ance as a factor in determining pay and 
schools which use a single salary schedule 
based on seniority, educational back- 
ground, and/or other factors. 

2. Such differences in morale which exist 
are in the direction of lower morale in 
schools which use a merit pay plan. 


THE MORALE INVENTORY 


In order to test these hypotheses it was 
necessary to develop an attitude inventory 
capable of giving a quantitative index of 
morale relative to a person's role as a 
participating member of a school system, 
Morale should be reflected in the attitudes 
a person has about himself and about per- 
sons and things in his behavioral field. 
With respect to the research reported here, 
these attitudes center around the school 
environment. 

In constructing the inventory, five atti- 
tude areas were identified which would 
provide a seemingly adequate measure- 
ment of the attitude possibilities inherent 
within the school environment: 


1, Self—Attitudes about the self in relation 
to the role played in the school system. 
Sample item: I feel that I am an im- 
portant part of this school system: 
....With rare exceptions at all 
times 
....Practically never 
....Most of the time 
....Only part of the time 


2For purposes of the research, the term 
"merit salary schedule" refers to any salary 
schedule for teachers which provides for 
salary above the regular Schedule as a re- 
ward to teachers who are judged to be giv- 
ing superior performance. 


CLAUDE MATHIS 


2. School—Attitudes about the immediate 
aspects of the school situation E 
working conditions, equipment, and the 
physical plant. 

goce item: The size of the class 
(classes) I work with is: 
....Large but satisfactory 
....Completely unreasonable 
.... About right 
.... Unsatisfactory " 
3. Community—Attitudes about the n 
munity in which the school is locate! 5 
Sample item: Getting acquainted wi 
people in this community 18: vis 
....Almost impossible to acco 
plish 
....A hard thing to do 
....Done with some effort 
-...Quite easy to do 

4. Administration—Attitudes abou 
the school is administered, and att 
about the people who administer. 

Sample item: When my te? 


the way 
à jtudes 


ching 


Schedule is planned: 
....I am never consulted 
..My desires are always sible 
preference whenever POS% ig- 
..I am consulted and m qom 
sires made known but se 
considered 
..My desires are usu& 
sidered 
5. Policy—Attitudes concerning the ve 
and policy making functions rela 
the school system. which 
Sample item: Policies under 
pay raises are granted are: 
-...Very unsatisfactory 
-...Sound and fair |. 
....Unfair in many ins 
....Reasonably satisfi 


are 
Ten statement o which sump, in 
provided above, were included 12 state 


ventory for each area defined. ge put 
ments were by no means exhaust fac 
appeared to represent a sampling jnvolv® 
tors within each area which mig jl- 
morale. Three individuals who We? pey- 
iar with morale studies in the field “ection 
chology served as judges in the er 
of the statements. A number of pace of 
were rejected because, in the TE pio? 
these judges, they were not ORC 
would have morale implications: s 
statements in each of the five arer whioh 
sented the possibility of an attito 


lly con 


olicies 


ces 


e te? 


| 


JT eee ee 


SALARY POLICY AND TEACHER MORALE 


would correlate highly with feelings of mo- 
rale. No statement was used which was not 
unanimously agreed upon by the judges. 
The 50 statements which were derived 
Were converted into multiple choice items 
^ Which Ss could choose one of four possi- 
le endings. The same expert judges who 
Participated in the selection of the state- 
ments also participated in the construction 
9f the possible choices. Endings were con- 
ranted for each statement which would 
ndicate very high and very low morale. 
o other endings were then constructed 
E ee degrees of high or low morale, 
d etween the two extremes. No attempt 
iw made to determine whether or not the 
ira represented approximately equal 
the rvals except through the agreement of 
fes Qoae If the agreement was not 
dhs ed on a particular ending, it was 
p or revised until accepted by Lj] 
ee judges. 
p tiore being printed in final form, the 
E. Mann were randomized, and the 
Sich Possible endings were randomized for 
‘ten statement. In this manner an attempt 
sh On to avoid any pattern which 
t. t have acted to influence the choice of 
um In order to obtain some indica- 
a or validity other than that involved 
ien agreement of the three judges who 
d icipated in the selection of the items, 
od individuals who knew nothing of the 
Sa oject or the attitude inventory 
ite e asked to indicat hich of the 50 
ae belonged to each of the five attr 
dinal areas, A contingency coefficient 
a Computed for each of the three judges 
f E index of the degree to which they 
* € able to assign statements to appro- 
me areas. These were positive and 
al ged from .76 to .79. The same individ- 
S were also asked to rank the response 
eines in terms of the degree to which 
hinged indicated either high or low 
oe The disagreement between judges, 
the the disagreement of each judge with 
already predetermined direction of the 
ngs was negligible. 


277 
LS 

The reliability of the attitude inventory 
was tested by means of the split-half 
method. The correlation between scores on 
the first 25 items with scores on the last 
25 items yielded a coefficient of .74. Using 
this correlation in the Spearman-Brown 
formula for a double length test a reli- 
ability coefficient of .85 was obtained. 


Tue SCHOOLS STUDIED 


Ten suburban school systems were se- 
lected to participate in the study. Of the 
10, two are large senior high schools, two 
are small senior high schools, and six are 
elementary schools. Two high schools and 
three elementary schools which use a merit 
salary system were selected first. Then an 
effort was made to match each of the five 
schools with a system using a single salary 
schedule. Factors considered in matching 
the schools included location in suburban 
communities with similar socio-economic 
population, number of teachers, true value 
of property, and current expenditure per 
pupil in average daily attendance. The 
matching represented the closest corre- 
spondence on all factors which could be 
obtained from a sample of possible schools 
in the immediate geographical area. 

From the sample of the 10 schools in- 
volved, 614 inventories were administered 
and collected. The sample tested repre- 
sented 336 Ss from merit salary schools 
and 278 Ss from single salary schools. With 
a few exceptions, the sample represented 
all the teachers employed in each school 
tested, with the number of inventories col- 
lected representing 100% of those admin- 
istered. The jnventories were scored by 
assigning & weight of 0 to the response 
ending which represented low morale, à 
weight of 1 to the next response interval 
above the low response category, 9 weight 
of 2 to the response interval below the high 
morale response, and a weight of 3 to the 
high morale response ending. An individ- 
ual's score on the inventory would repre- 
sent the addition of the assigned weights 
for the response endings selected by each 


278 


TABLE 1 
MEANS AND STANDARD DEVIATIONS OF THE 
INVENTORY SCORES ror EACH OF THE 
TEN ScHoors 


Merit Nonmerit 
Mean SD Mean SD 
114.82 13.45 109.31 12.72 
113.90 14.83 112.68 — 14.27 
107.95 13.96 118.68 13.74 
113.68 10.55 115.40 11.79 
119.19 13.30 113.91 13.07 


individual. The instructions printed on 
the inventory emphasized the selection of 
only one response for each statement. The 
number of inventories in which items were 
skipped represented less than 2% of the 
sample. 


RESULTS 


The range of scores from the sample of 
614 individuals in the 10 schools involved 
in the study was from a low of 66 to a high 
of 145. The mean of the total distribution 
of scores was 113 with a standard devia- 
tion of 13. Table 1 presents the means and 
standard deviations for each school of 
scores on the attitude inventories taken by 
teachers at that school. 

When the total sample had been tested 
the data were tabulated in a manner that 
would allow for frequency totals with re- 
spect to schools and areas within the mo- 
rale inventory. The data were then 
grouped so that a test could be made of 
differences between all 10 schools, as well 
as differences between schools grouped on 
the basis of pay plan. Also, the data, 

grouped on the basis of areas within the 
morale inventory, were examined to de- 
termine if any of the five areas produced 
a score significantly different from the 
ther areas. The statistical procedure used 
: an analysis of variance technique. 
ns analysis of variance for the test 
he basis of schools, re- 
data grouped on t 


of the type pay plan used by the 


gardless f 5.15, significant 


school, produced an Fo 


CLAUDE MATHIS 


at the .01 level of confidence. The analysis 
of variance for the test data grouped on 
the basis of merit as compared to nonmerit 
schools produced an F of 2.03 which was 
not significant at either the .01 or .05 levels 
of confidence. The analyses of variance 
designed to test for differences in response 
between areas of attitude involved in the 
inventory produced Fs which were not 
significant. 

From the results of these analyses of 
variance, it appears that the level of mo- 
rale of personnel in the schools of the sam- 
ple studied is not directly related to the 
type pay plan (merit or nonmerit) used 
in the school. Significant differences at the 
-01 level of confidence were found between 
schools relative to the manner in which 
personnel in the sample responded to the 
attitude inventory. The inventory "Sd 
to have been approached in a ee 
manner from school to school with respec 
to the referential meaning it had for t x 
Ss. If the inventory measures morale, = 
would expect differences between schoo i 
since it is only logical to assume that RS 
rale conditions vary from one school 
another. The sensitivity of the inventory 
is suggested by this difference betwen 
schools when one realizes that the ons 
used for the research have salary ape 
well above the national average and pe K 
similar to each other throughout the to sd 
salary range, whether the school happo 
to have a merit pay plan or a nonm 
pay plan. erit 

The lack of significance between T^ 
as compared to nonmerit schools a doeë 
sample suggests that level of morale db 
not differ over and above chance e t 
ancy when comparing personnel in Taer 
and nonmerit schools. If morale 
ences exist between schools in the $ 
but not between schools classified "evel 
basis of salary plan, then morale f fac- 
should be the result of a multitude 0* `; g- 
tors in the school environment papi : 
gether with, or aside from, a merit 07 
merit type of salary policy. 


SALARY POLICY AND TEACHER MORALE 279 


No significant differences between areas 


^ involved in the inventory were found with 


OP —— -—— ———————Uos ———À 


Tespect to a comparison of the frequency 
of choice of response endings for questions 
m each area. Such a finding can be inter- 
preted as meaning that individuals tended 
to approach the inventory as a whole, 
Which could indicate that level of morale 
I$ à general factor reflected in a person's 
Teaction to a school system as a whole. Ss 
Used in this research responded consist- 
ently throughout the total inventory and 
did not reflect significantly different levels 
of morale relative to each area. Each at- 
titudinal area appears to contribute in an 
Approximately similar manner to the over- 
all indication of morale given by the in- 
ventory. 

While the results presented in this re- 
Search are thought provoking with respect 
to a consideration of the merit pay issue 
m education, they should not be consid- 
ered applicable to school situations other 

an the sample used here. The similarity 
9f salary schedules in the sample reported 
ere could be one reason for the lack of 
Significant difference between merit and 
Nonmerit schools. Perhaps there is little 

ifference in level of morale because all 
€ teachers are receiving an adequate sal- 
ary regardless of the type pay plan. 

One final word of caution should be 
Stated. The inventory used in this research 
Was designed as a measure of teacher mo- 
Tale. While necessary precautions were 
taken to insure its adequacy as a research 
qi irument, it is subject to the same criti- 
‘sms concerning reliability and validity 

ich are common to most attitude inven- 
°ries, one major criticism being that the 


inventory measures an attitude about an 
attitude, and not a direct behavioral re- 
sponse. 


SUMMARY 


Personnel in 10 suburban school systems 
were given an attitude inventory designed 
to measure level of morale. The inventory 
was administered for the purpose of deter- 
mining what differences in level of morale, 
if any, exist between schools which use a 
merit type salary schedule and schools 
which use a nonmerit type salary schedule. 
The statistical analysis of the data permit 
the following conclusions: 

1. No significant difference in morale 
level was found between schools grouped 
on the basis of type of salary schedule. 

2. A significant difference in level of mo- 
rale, as measured by the attitude inven- 
tory, was found between the 10 schools 
involved in the sample. 

3. No significant differences in indica- 
tion of morale level were found between 
areas of the attitude inventory, suggesting 
that individuals within each school tended 
to approach the inventory as a whole 
rather than projecting differential feelings 
of morale into specific areas of the inven- 


tory. 
REFERENCES 


Corrman, W. E. Teacher morale and curricu- 
]um development: A statistical analysis 
of responses to a reaction inventory. 
J. ezp. Educ., 1951, 19, 305-332. 

Haire, Mason. Industrial social psychology. 
In Gardner Lindzey (Ed.), Handbook of 
social psychology. Cambridge, Mass.: 
Addison-Wesley, 1954. Pp. 1104-1123. 


(Received May 13, 1959) 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 6, 1959 


DIMENSIONS OF LEADERSHIP BEHAVIOR IN 
CLASSROOM DISCUSSION GROUPS: 


CLAUDE J. BARTLETT 
George Peabody College for Teachers 


Although many studies have been done 
in the search for the dimensions or traits 
of leadership behavior, an almost equally 
large number of dimensions of leadership 
have emerged. Certainly many of the di- 
mensions found have been similar, and 
their difference may in some cases be a 
function of the interpretation of the stud- 
ies. However, enough evidence has been ac- 
cumulated (Stogdill, 1948) to indicate that 
leadership does not involve the same be- 
havior in all situations. 

The purpose of this study was to iden- 
tify the dimensions of leadership behavior 
in classroom groups. It was conducted in 
a classroom situation where the group dis- 
cussion method was used as the principal 
teaching method. This situation offered an 
excellent opportunity in which to study 
leadership behavior, since the class mem- 
bers worked in a discussion situation for 
approximately 12 weeks. Thus, the group 
members spent enough time in group dis- 
cussion to demonstrate their abilities for 
leadership. In this particular situation, 
there was also considerable motivation to 
participate in discussion since a portion of 
the class grade is dependent upon contri- 
bution to group discussions. 


METHOD 


The subjects used for this study were 
students in an introductory course in edu- 
cational psychology in which the small 
group discussion technique was used as 
the principal method of instruction. At the 
beginning of the quarter each class was 
divided into several groups of four to six 
students each. Each group worked sepa- 


i hio State 

4 study was done at the O 2 
ete as part of a doctoral dissertation 
under the supervision of Robert J. Wherry. 


rately on various topic questions and proj- 
ects. The class procedure afforded an op- 
portunity for the students to evidence 
leadership behavior and to become ac- 
quainted with other members of the group 
in a discussion situation. 

Leadership was defined as being per- 


ceived as a leader by the other members, 


of a discussion group. Students from three 
different classes (about 75 students) were 
asked to write essays describing the be- 
havior of the most outstanding leader Jn 
their discussion group. They were aske 

to describe the behavior of the person 
chosen as a leader rather than write an ce 
say on an ideal leader. Thus, the descrip 
tions received tended to contain concrete 
behavioral descriptions of persons Dd 
ceived as group leaders. These essays s 
vided a source of short descriptions of t? B 
behavior which could be used as items ! 

a descriptive check list. Three hundre 

phrases were obtained. f 


— 1953) ° 
The Wherry-Winer Method ( items 


(6) Personal acceptability (c) Mot 
(d) Group organization (e) Demo 
autocratic (f) Self-organization. 
were selected to represent these cat 
on the basis of expert (advanced 81% iis 
students in psychology) agreement. the 
terion of 80% agreement or better x a 
five experts used was required gu 
item was included in a category- ory: 
20 items were used to define each cates 00 
The check list was administered tional 
students of the introductory oe: ap 
psychology class, divided into be grouP 
proximately equal groups. The firs 


cratic- 
Items 
egories 


280 


a 


LEADERSHIP BEHAVIOR IN THE CLASSROOM 


Was asked to think of someone in the up- 
per half of the class in terms of leadership 
qualities. The second group was asked to 
think of a person from the lower half in 
terms of leadership. The rest were asked 
to think of someone who was about aver- 
age. After a student had called to mind a 
particular group member, he was asked to 
consider the 300 phrases, one at a time, 
as they applied to this individual respond- 
ing to each phrase on a five-point scale of 
applicability. 

A rating on over-all leadership by a 
group member was also obtained for each 
Student. The applicability rating on each 
Phrase was correlated with the leadership 
Tating. Substantial positive correlations 
Were obtained for almost all phrases, indi- 
cating that the phrases described the posi- 
tive aspects of leadership. 

Category scores were obtained for each 
of the 100 students on the six tentative 
Categories. The six categories were inter- 
Correlated and the Z portion of the Wherry 
test selection procedure (Stead & Shartle, 
1940) was used to select categories. Cate- 
Sories whose unpredicted variance fell be- 
low .20 were eliminated. This procedure is 
Suggested by Wherry and Winer (1953). 

herry and Winer also suggest that these 
categories may be assumed to be good esti- 
Mates of the centroids that would be de- 

ed by the corresponding interitem cor- 
Telations, Thus, the correlations between 

e selected categories and the 300 items 
May be considered as the oblique factor 
Sadings as defined by these categories. A 
Method Suggested by Wherry (1959) was 
nsed to transfer these estimates of oblique 
a dings with simple structure preserved 
Og the group factors. Two hand rota- 
n Were performed to clarify the simple 

Tucture and interpretation of the group 
Actors, 


Resutrs 


Ea Seneral factor and four group factors 
the ed from the analysis. Almost all of 
0 phrases received substantial load- 


281 


ings on the general factor and one of the 
group factors. In the interest of economy 
of space, only the 15 phrases which show 
the highest loading on each factor are re- 
ported. A complete list of the loadings of 
all phrases on all factors appears in the 
investigator’s doctoral dissertation (Bart- 
lett, 1959). 

I. Halo. This factor shows substantial 
loadings on almost all of the phrases. Thus, 
it seems to be a general tendency to rate 
a person high or low on the basis of some 
general impression toward the person be- 
ing rated. The phrases showing the highest 
loadings on this factor also correlated 
highly with a rating on overall leadership, 
indicating that this halo may have been a 
function of characteristics which are pos- 
sessed by leaders. For this reason an in- 
terpretation of general leadership ability 
might also be appropriate for this factor. 
However, since the ratings on over-all 
leadership were also subject to the halo ef- 
fect, an interpretation of halo or general 
bias seems more appropriate. 

The fifteen phrases which have the high- 
est loadings on this factor are given with 
their respective loadings as follows: 


Ideas show good judgment (.84) 

Ideas are excellent (.83) 

Makes many worthwhile comments (.83) 
Answers wisely (.82) 

Has good judgment (.82) 

Gives well thought out answers (.82) 
Has many intelligent ideas (.81) 
Suggestions are worthwhile (.81) 

Has good ideas (.81) 

Is a good thinker (.81) 

Does important things well (.81) 

Give better arguments than others (.80) 
Serves as guide to rest of group (.80) 
Says things that are constructive (.80) 
Is on the ball (.80) 


II. Contribution of Ideas and Informa- 
tion. This factor seems to be made up of 
phrases which describe a person who is 
not only perceived as being intelligent, but 
also as having a fund of information to 
contribute to the group. In a classroom 
discussion situation, the presence of this 
kind of person seems appropriate. 


282 


The 15 phrases with highest loadings on 
this factor follow: 


He knows his stuff (.36) 

Very intelligent person (.36) 

Knows what he is talking about (36) 

Can grasp essentials of situation (.34) 

Gets best test grades in class (.34) 

Expresses ideas well (.33) 

Thinking is original (.32) 

Suggestions are worthwhile (.32) 

Well-informed (.31) 

Knows his stuff (.31) 

Is a real brain (.30) 

Able to evaluate suggestions of others 
30) 
£ Has many intelligent ideas (.30) 

Has good ideas (.30) 

Knows more than others in the group 
(.29) 


III. Contribution of Friendly Atmos- 
phere. The phrases loading on this factor 
seem to describe the type of person who is 
easy to get along with and who might serve 
as a morale booster in a group discussion. 
Although this type of person may not con- 
tribute to the goal of the group directly, 
the friendly atmosphere created may work 
indirectly in aiding a group to achieve its 
goals. 

The phrases representing this factor 
with their respective factor loadings are 
as follows: 


Well-liked by all (.57) 

Easy to get along with (.55) 

Friendly (.55) 

Always very pleasant (.54) 

Manner is pleasant (.54) 

Well-liked by others (.52) 

Very friendly (.52) 

Seems to have a likeable personality (.51) 

Has an agreeable disposition (.50) 

Always willing to listen to the other side 
of an argument (47) 

Always has a smile (47) 

Very agreeable (47) 

Does not overexercise power as a leader 


um not acquire resentment of others in 


(46) . 
gr cooperative (46) 


IV. Contribution of Labor and Effort. 
The phrases which make up this factor 
describe a person who is the work horse 
of the discussion group. This type of per- 


CLAUDE J. BARTLETT 


son contributes to the group by taking re- 
sponsibility for such things as writing re- 
ports, bringing in material, and taking care 
of details. 

The phrases which represent this fac- 
tor and their loadings follow: 


A hard worker (.65) 

More than willing to do his share (.59) 

Willing to do extra things (.53) 

Takes more than his share of work home 
(.52) 

Always ready to serve (.51) 

Willing to work (.50) 

Shows a persistent effort (.49) 

Not afraid to do more than his share (49) 

Is a good worker (48) 

Seems to enjoy work (.46) k 

Does not sit by and let others do the wor 
(45) i 
Takes responsibility very seriously (44) 
Earnest about work (43) 
Shows enthusiasm in all projects (41) 
Does work outside of class (.41) 


V. Contribution of Policy and Dec- 
sions. The phrases which define this factor 
describe a person who sees himself as the 
central figure of the group. This type Pe 
son pushes the group toward a decision, 
even at the risk of making it a person 
rather than group decision. This factor 
seems to represent the behavior of ded 
son who might be placed in the tradition? 
category of autocratic leader. The phrase 
defining this factor follow with their 1°- 
spective loadings: 


Often inclined to think he is the only 97^ 
right (.56) 

Insists on his way (.56) 

Prefers to do entire project 
(49) 


by himself 


Often seems to dominate group aima 
sions ( 44) dd 40) 

Doesn't listen to others’ opinions C 

Not always democratic (.39) 

Quite dominant (.35) ‘ 35) 

Forever involved in a dicen G 

Seldom changes his opinion €. e- 

Leaves more "mid souls” of grouP 


hind (.33) 

Argues his point until co 
wrong (.33) 

Has a solution for every Pro?” 

Argues his point until he conv)? 
(30) 


is 
nvinced he ? 


(31) 
m others 


LEADERSHIP BEHAVIOR IN THE CLASSROOM 


Always willing to add an extra word (.27) 
Comments arouse controversy (25) 


Discussion 


The general factor was expected to 
emerge from this analysis, since all of the 
phrases which were used were descriptions 
of the behavior of persons who were lead- 
ers in a discussion group situation. Also in 
check list rating scales, a general factor 
Would be expected, since response bias is 
Not controlled. Since most of the phrases 
which have highest loadings on the general 
factor also have high loadings on the fac- 
tor indicating a Contribution of Ideas and 
Information, it would seem that this is the 
kind of contribution a group member 
Should make if he aspires to being classi- 
fied as the group leader. The halo in rat- 
Ings of students in the classroom appears 
to be set by academic type performance. 
. The four group factors were interpreted 
™ terms of four different ways of con- 
tributing to group activity. These four di- 
Mensions of group contribution suggest 
four independent ways by which a person 
can become a group leader. The person 
Who could contribute to a group in all of 

se ways would, presumably, be the more 
desirable type leader, but such persons 
are in limited supply. It might be hypothe- 
Sized that a group made up of four per- 
ons, each strong in one of the areas de- 
fined by the group factor, would be an 

cient group. A rating scale has been 
constructed using the phrases from this 
Analysis (Bartlett: 1958, 1959) which 
might be useful to the instructor in setting 
UD the membership in groups in order to 
test Such an hypothesis, However, the use 
this rating scale in courses other than 
sen psychology would be question- 


wae Viewing the dimensions found in this 
Y as they are related to dimensions of 
®adership found by others, two of the 
he factors from the present study 
E ni to be similar to factors found in 
Ying leadership behavior in other situ- 


283 


ations. The two strongest factors (in terms 
of percentage of total variance explained) 
of the Ohio State University Leadership 
Studies, as reported by Halpin and Winer 
(1952), were Consideration and Initiating 
Structure. The Consideration factor seems 
to be similar to the factor indicating Con- 
tribution of Friendly Atmosphere, which 
was found in the present study. The fac- 
tor, Initiating Structure, appears to be 
similar to the factor of the present study, 
Contribution of Policy and Decisions. 
The two factors, Contribution of Ideas 
and Information and Contribution of La- 
bor and Effort, appear to be a function of 
the classroom situation. In this situation 
the group orientation is toward learning 
the course-related materials. Thus, ideas 
and information play an important role. 
In these groups of students, the work done 
in achieving the goals of the group is done 
by the students who make up the group. 
Thus, the leader of the group would also 
be expected to do his share of the work. 


SUMMARY 


A factor analysis of 300 phrases describ- 
ing leadership behavior in classroom dis- 
cussion group was done using the Wherry- 
Winer Method (1953). The study was done 
to examine the dimensions of leadership 
behavior in a classroom group discussion 
situation. The analysis yielded four group 
factors and a large general factor. The 
general factor was interpreted as a general 
tendency to make high or low applicability 
ratings of the phrases on the basis of the 
halo effect. The four group factors were 
interpreted in terms of the ways which a 
group member can contribute to the group 
discussion: ideas and information, friendly 
atmosphere, labor and effort, policy and 


decisions. 


REFERENCES 
Bartierr, C. J. The relationships between 
self-ratings and peer ratings on a lead- 
ership behavior scale. Unpublished doc- 
toral dissertation, Ohio State Univer. 
1958. 


284 CLAUDE J. BARTLETT 


Bartuett, C. J. The relationships between 
self-ratings and peer ratings on a lead- 
ership behavior scale. Personnel Psy- 
chol., 1959, 12, 237-246. 

Haury, A. W., & Wier, B. J. The leader- 
ship behavior of the airplane com- 
mander. Columbus: Ohio State Univer. 
Res. Found., 1952. 

Sreap, W. H., & Smartie, C. L. Occupa- 
tional counseling techniques. New York: 
American Book, 1940. 


SrocpiLL, R. M. Personal factors associated 
with leadership: A survey of the litera- 
ture. J. Psychol., 1948, 25, 35-71. 

Wuerry, R. J. Hierarchial factor solutions 
without rotation. Psychometrika, 1959, 
24, 45-52. 

Wuenrry, R. J, & Wer, B. J. A method 
for factoring large numbers of items. 
Psychometrika, 1953, 18, 161-179. 


(Received May 21, 1959) 


ee 


JOURNAL or EDUCATIONAL PsYcHoLoGY 
Vol. 50, No. 6, 1959 


INDIVIDUAL DIFFERENCES IN MEMORY 


JAMES B. STROUD ax» LOWELL SCHOER 
State University of Iowa 


One of the persistent problems in the 
field of memory has been that of indi- 
vidual differences in retentiveness. Histor- 
leally, differences in retentive ability have 
been linked with differences in learning 
ability. For example, MeGeoch and Irion 
(1952) wrote: “By and large, individual 
differences in learning are reflected in in- 
dividual differences in retention.” 

When subjects have been given equal 
amounts of practice and amount learned 
has been used to define differences in 
acquisition, learning scores have been 
found to be positively related to retention 
Scores. This is probably what McGeoch 
and Trion had in mind. Another facet of 
the problem concerns the relationship be- 
tween learning and retentive abilities when 
all Ss attain a common performance cri- 
terion and differences in the rate at which 
this is done are used to define differences 
im learning ability. The evidence pertain- 
Ing to this aspect of the general problem 
15 less conclusive. It is with this aspect that 
he present paper is concerned. 

he main body of evidence in support 

of the proposition that retentive ability is 
Positively related to learning ability has 
een derived from analysis of learning and 
Telearning scores. There is little doubt that 
ere exists a positive and substantial cor- 
Telation between trials to learn and trials 
© relearn upon the part of an array of 
S. This type of analysis, however, seems 
-9 be a questionable procedure. Relearn- 
= Scores reflect learning ability to some 
ent, perhaps to a considerable extent. 

i © would expect that Ss who learn rap- 

Y would relearn rapidly because, if for 
Be Other reason, they are rapid learners. 
larly, those who learn slowly would be 

Pected to relearn slowly, because they 


are slow learners. At best, relearning scores 
reflect both learning ability and retentive 
ability. 

On the other hand, the relationship be- 
tween recall and rate of learning is far 
from close, when all Ss achieve a common 
trials-to-learn criterion, as Underwood’s 
(1954) data, for example, and those herein 
reported show. In the customary verbal 
learning experiment, it is possible for the 
slow learner to achieve a level of mastery 
equal to that of the rapid learner if he 
takes enough time. In many types of 
learning situations, especially those re- 
garded as most significant educationally, 
it is, of course, not possible for all learners 
to attain equal mastery. Some will achieve 
levels of insight, or levels of skill, not at- 
tainable to others. Under these circum- 
stances, we would expect to obtain wide 
differences among good and poor learners 
in performance on retention tests. 

In the problem at hand, we were in- 
terested in individual differences in reten- 
tion among Ss, all of whom have attained 
a common criterion. Demonstrating that 
recall scores are unrelated to rate of learn- 
ing, when all Ss achieve the same level of 
learning, is not, of course, equivalent to 
demonstrating that significant differences 
in recall do not exist. A basic question 
facing us today is whether or not indi- 
viduals differ reliably among themselves in 
retention of materials which have been 
mastered equally well by all. If it could 
be shown that Ss do differ reliably in re- 
call ability and that these differences are 
not associated with differences in learning 
ability, then recall, and possibly retention, 
would be established as an independent 
variable—especially so if the differences 
possess some degree of generality. 


285 


286 


PROCEDURE 


With these thoughts in mind the senior 
author planned and carried out the ex- 
periments reported in the following pages. 
Originally it was planned to have each of 
a number of Ss learn, recall, and relearn 
four or five different kinds of material. 
Later, since it seemed desirable to have 
the Ss learn pairs of comparable lists so 
that some kind of assessment of the re- 
liability of the scores could be made, this 
plan was given up. 'The materials adopted 
consisted of two lists of paired adjectives, 
12 pairs per list, and two lists of picture- 
name pairs, 10 pairs per list. All lists were 
presented by the paired-associate antici- 
patory method. 
Adjective pairs of low associative, syn- 
onymity, and familiarity value were se- 
lected from the Haagen (1943) list. These 
were arranged in two comparable lists as 
mentioned. In the preparation of the pic- 
ture-name pairs, a suitable number of 
photographs of senior, male students in 
the College of Engineering were selected 
from a pool of pictures just taken for the 
University Student Annual. Paired with 
each such picture chosen was a fictitious 
first and last name. The adjective pairs 
and the picture-name pairs were then re- 
produced on motion-picture film in such 
a way that the first member of a pair 
could be exposed on a screen for two 
seconds to be followed immediately by a 
two-second simultaneous exposure of both 
members of a pair. In the case of the ad- 
jective pairs, the second member, the re- 
sponse member, appeared just to the right 
of the first or stimulus member. In the 
case of the picture-name pairs the name, 
the response item, appeared directly un- 
der the picture, the stimulus item. An in- 
terval of .5 second intervened between ex- 
posures of adjacent pairs within a list. An 
1The author wishes to samen Ge 
TSO) E 
ey dere d T Carter, A. J. 
vein Clifford Howe, and 8. Muehl. 


JAMES B. STROUD AND LOWELL SCHOER 


interval of five seconds occurred between 
trials. 

The items in each list were presented in 
two different serial orders, the two orders 
alternating in a given series of trials. After 
the first trial, the Ss’ task was to anticipate 
(orally) the response member within the 
two-second interval in which the stimulus 
member alone was exposed. This was re- 
peated for each pair within the list, trial 
after trial, until the learning criterion (a 
trial without error) was reached. 

The items comprising each list, in two 
different serial orders, were filmed sepa- 
rately, making each of the four reels con- 
sisting of some 50 feet of film. Each such 
length of film was made into a loop and 
mounted in a winder. The films thus ran 
continuously during any one learning seS- 
sion for as many trials as required to 
reach the criterion. Once the running of 
the film was started for a given S the ex- 
perimenter had no duties beyond record- 
ing the responses and stopping the pro 
jector at the end of the learning session. 
The four lists were presented to the Ss 
in a counterbalanced order? 

The Ss were given five prac 
on a 6-pair list of adjectives, just pt 
to the learning of the first paired adjec- 
tive list. They were given five practice 
trials on a 4-pair list of picture-names 
just prior to learning the first picture 
name list. " 

The Ss (N = 149) were sophomore stu- 
dents enrolled in the Introduction to PSY 
chology course, State University of loss 
They reported one at a time for five y 
secutive days, starting on a Monday: d 
Mondays the Ss learned the experimen 
list appropriate to each for that day- ^ 
Tuesdays the Ss first recalled and a 
learned Monday’s lists and proceeded 


- 

? The method of presenting the mater g 
just described was adopted because Serb 
convenience in handing the picture m? T 
It proved to be quite as satisfactory 3 
memory drum in handling the ver eal of 
terials. The method permits a great 
flexibility. 


tice trials, 
prior 


INDIVIDUAL DIFFERENCES IN MEMORY 287 


learn Tuesday’s list. This procedure con- 
tinued until Friday, on which day the Ss 
ar recalled and relearned Thursday’s 
sts. 


RESULTS 


In Table 1 are presented means of the 
learning and relearning scores in trials, 
mean recall scores in words anticipated, 
and SDs of the three respective scores. 
As may be inferred from the size of the 
SDs of learning scores, the distributions 
of trials to learn were rather markedly 
skewed to the right. 

As a first step in the analyses of the 
data, within-orders intercorrelations were 
computed (a) among the learning scores, 
(b) among the relearning scores, and (c) 
among the recall scores. Since we wished 
to be able to determine practice effects, 
We presented the lists in counterbalanced 
Order. In order to obviate any possible 
attentuating effects of this procedure upon 
the correlations, within-orders correlations 
Were computed, except where otherwise 
Indicated. The results are reported in 
Table 2. In a sense the correlations in- 
Volving comparable lists, those in the top 
two rows, may be regarded as reliability 
Measures, All the r's are significant at the 
01 level of confidence. Overall, the r's be- 
tween various relearning tasks are about 
05 higher than those between the various 
recall tasks. Incidentally, we computed the 
"8 between saving scores for comparative 
Purposes. These were found to be .25 be- 
Ween PA, and PAs, and .11 between PN; 
and PN,, 

Shown in Table 3 are the r's between 
vi learning and recall, (b) relearning 
kai recall, (c) learning and relearning, 
of (d) learning and relearning by levels 
S ane "The latter in effect is the correla- 
Fw between learning and relearning with 
ine: partialled out. The r's between 

Thing and reeall are not significant in 
hi case of the two paired adjectives lists, 
» are significant at the .01 level on the 

Ure-names lists. At best the relation- 


TABLE 1 
Mean TRIALS TO LEARN AND RELEARN, 
Mean Recaty Scores, anp SDs 
or PAIRED ADJECTIVES AND 
Picture-Names Lists 


PA: PA: PN: PN N 
Learning? 11.85 10.50 13.08 14.33 149 
SD 7.57 5.86 0.05 7.30 149 
Recall 8.20 8.40 6.09 5.55 149 
SD 2.19 2.10 2.18 2.32 149 
Relearning 3.54 3.17 3.52 3.68 149 
SD 1.75 1.54 1.70 1.73 149 


^ Does not include first presentation trial. 


* 
TABLE 2 
WITHIN-ORDERS CORRELATIONS 
AMONG VARIABLES 


Learning Relearning Recall 
PA, PA: .69 PA, PAs .34 PA, PAs .23 
PNi, PN:.79  PNi,PNi.34 PNi, PN: .30 
PA, PNi 46 PA, PN1.29. PA, PNI E 
PA, PNi.44 PA, PN3.38 PA, PN: .27 
PAs, PN1.46 PAs, PN1.32. PAs, PNi.34 
PA:, PN:.45 PAs, PN:.40 PAs, PN: .24 


Note.—r .159 (N = 150) significant at .05 level; r .208 
(N = 150) significant at .01 level. Values are averages, 
based upon z transformations, of the r’s of the four or- 
ders. 


TABLE 3 
WITHIN-ORDERS CORRELATIONS AMONG 
LEARNING, RELEARNING, AND 
RECALL SCORES 


Tasks 
Scores M ———— MÀ 
PA; | PA: | PN1 | PN: 


.04| —.12| —.25| —.23^ 
—.40| —.59| —.34] —.30* 
.34| 40| .49| .55 
.99| .97| .97| .48 


Learning and Recall 

Recall and Relearning 

Learning and Relearning 

Learning and Relearning 
by levels of recall (aver- 
age)? 


g and relearning scores are trials scores. 


? Learnin score 
e 1'8 between these scores and recall signify 


"Thus negativ 
positive relationships. 
b Not within-orders. 
© Values are based upo: 
formations of the r's for t! 


n weighted averages of z trans- 
he separate levels. 


288 


ship between learning and recall, as ob- 
tained, is quite low. This may be to some 
extent a function of relatively low reli- 
ability of recall scores. 

As an additional attempt to assess the 
relationship between learning and recall 
scores, the Ss were divided upon the basis 
of level of learning scores into fifths on 
each task and mean recall scores deter- 
mined for the respective fifths. These were 
found to be as follows: 


List Fit F: Fa Fi Fs 

Phi S7 8.2 83 84 76 

PA: 8.2 8.8 8.6 8.2 8.9 

PN: 5.3 6.2 6.7 6.0 6.4 

PN: 5.0 5.9 6.1 5.1 5.8 
* The slowest fifth. 


None of the differences among these means 
for any given list departs significantly 
from zero. In this treatment, recall and 
learning scores certainly appear to be es- 
sentially unrelated. 

As seen in Table 3, significant correla- 
tions were obtained between recall and 
relearning scores for each of the four lists 
of material. This is a logical outcome. 
Even if recall and learning ability are not 
in fact related at all, we should expect to 
obtain significant correlationships between 
recall and relearning because the Ss who 
recall most have least to relearn. Indeed 
this suggests that relearning, at least after 
a 24-hour interval, is a valid measure of 
retention. Our earlier discussion has not 
denied this. We maintained only that it 
is not known to what extent relearning 
scores represent retention and to what 
extent, learning ability. 

It seemed logical to us that by com- 
puting correlations between learning and 
relearning by levels of recall, that is, by 
partialling out recall effects, we could ob- 
tain some idea of the extent to which re- 
learning scores are affected by recall. We 
did this. The results are reported in Table 

3. Overall, the partialling out procedure 
resulted in slightly lower correlations. Per- 
haps this is what one should ges in 
the light of the 7s obtained between learn- 


JAMES B. STROUD AND LOWELL SCHOER 


ing and recall. However, the distribution 
of the r's for the various levels of recall 
is quite revealing. The magnitude of the 
rs varied inversely with the level of re- 
call, as is illustrated by the values for 
lists PA, and PA, combined, as follows, 
starting with the lowest level: . 


Level of Recall Total N flc 
1 30 .65 
2 28 .64 
3 45 .46 
4 62 .99 
5 46 .34 
6 34 .08 
7 15 .00 


Our data, together with those of workers 
in the past, suggest at best no more than 
a slight positive relationship between rate 
of learning (to a common trials criterion 
and recall Underwood (1954) has made 
the point that the attaining of a common 
trials criterion upon the part of fast an 
slow Ss does not guarantee equal habit 
strength by the two groups. At least i 
probably does not do so for all of the 
items in a list. This, if true, does ns 
forbid the use of designations fast aP 
slow upon the basis of a trials criterio? 
nor the making of comparisons of reter- 
tion of the two groups thus designate ; 
However, Underwood's point is highly 
significant for the basic question of ino" 
vidual differences in retention among ^ 
who do in fact achieve equal mastery a 
acquisition by a habit strength criterion: 

By a special treatment of his oe 
Underwood compared fast learners oma 
50%) with slow learners (lower 50%) he 
one day recall of items on which the i 
groups, during acquisition trials, d 
shown equal probabilities of correct E 
sponses on subsequent trials, once de " 
responses had been made. The two p 
did not differ in recall of the items p 
defined. We analyzed our results for 
levels of learning performance. *^ 
the Ss did not during the acquisiti cor" 
manifest equal probability of making onc 
rect responses on subsequent trials 


course: 
on trials 


INDIVIDUAL DIFFERENCES IN MEMORY 


correct responses had been made. This 
inequality is probably one of the basic 
differences between fast and slow learners, 
as others have suggested. Perhaps Ss differ 
in powers of discrimination and differ- 
entiation which permit them to make cor- 
rect responses in the first place. This may 
be regarded as one basis for differences 
in rate of acquisition. Another basis, as 
Suggested by inference, is differences in 
the degree to which reinforcement (or at 
least the making of a correct response) 
contributes to habit strength. Cutting 
across, or interacting with, both are dif- 
ferences in susceptibility to intraserial in- 
terference, as illustrated in length-of-list 
effects. For example, Carter (1958) has 
Shown that the increasing of length of 
lists affects slow learners more adversely 
than fast learners. Indeed, differences in 
Susceptibility to intraserial interference, 
as between fast and slow learners, may 
account for the observed differences in 
effectiveness of reinforcement. 

: We computed (a) the number of re- 
inforcements taken to learn each list by 
each fifth of the Ss, by rank order, (b) 
the number of item presentations to reach 
the criterion, and (c) the ratio of number 
Of item presentations (P) to the number 
of reinforcements (R). The results are 
presented in Table 4. 

It is seen that the number of reinforce- 
ments involved bears a more or less con- 
Stant ratio to the number of item pres- 
entations at all five levels of learning 
Performance, the mean value of R/P for 
the four lists combined being .53, .54, .55, 
54, and .56. Perhaps this ratio would 
vary for all ability levels, as it seems to 
do so here, with the over-all difficulty of 
the list. This tells us that the mean num- 
ber of reinforcements, in an array of Ss, 
bears a fairly constant ratio to the num- 
ber of item presentations or, to what 
amounts to the same thing, the number 
of trials. The mean number of reinforce- 
ments accomplished by the slowest fifth 
9f the Ss is 3.6 times as great as that by 


289 


TABLE 4 


NUMBER or REINFORCEMENTS (R) BY Ss 
PERFORMING AT VARIOUS ACQUISITION 
RATES, NUMBER OF Irem PRESENTA- 

TIONS (P), AND Ratio or P To R 


Mean Mean 


Trials Number Men 
Numbe 
Lem E RV 
Fiths "OS (R) 
PA: 
5 5.86 70.32 40.37 .57 
4 8.77 105.24 60.43 .97 
3 10.80 129.60 77.73 .60 
2 14.07 168.84 99.67 .99 
1 23.73 284.76 162.67 .57 
PA; 
5 5.57 66.80 38.27 97 
4 7.93 95.16 57.50 .60 
3 10.40 124.80 76.97 .62 
2 13.40 160.80 93.30 .58 
1 18.66 223.92 140.25 .03 
PN: 
5 7.47 74.70 38.47 „51 
4 10.83 108.30 55.33 .51 
3 12.93 129.30 68.27 .53 
2 16.13 161.30 81.23 -50 
1 22.90 229.00 118.47 .92 
PN: 
5 7.48 74.80 34.89 E 
4 10.55 105.50 51.66 .49 
3 14.14 141.40 68.79 .49 
2 17.83 178.30 91.13 51 
1 25.43 254.30 126.97 .50 


the fastest fifth. Another way of stating 
the matter is that the Ss in the top fifth 
gained on the average .29 of an item per 
reinforcement, while those in the lowest 
fifth gained only .08 of an item per rein- 
forcement, averages for all four lists, or 
44 items. The middle fifth of Ss gained 
.15 of an item per reinforcement, by the 
same method of computation. A similar 
value for the second highest fifth is .20 
and for the second lowest fifth, .12. 

In the course of attaining the criterion 
of one perfect trial, the slowest fifth of 


290 


the Ss had 3.6 times as many reinforce- 
ments as the fastest fifth. If we allow 
that the fastest learners have some small 
over-all advantage in recall over the slow- 
est learners, it would follow that the 3.6 
to 1 ratio in reinforcements is not quite 
sufficient to insure equal habit strength. 
For our two extreme groups and for the 
two adjacent groups, here referred to as 
Fast, Moderately Fast, Moderately Slow, 
and Slow, we computed the probability 
of correct responses occurring on the next 
trial after varying numbers of reinforce- 
ments. These results for the two lists of 
paired adjectives combined are shown in 
Table 5. 

Taking these results at their face value, 
we may observe that the slowest group 
of Ss required approximately five rein- 
forcements to establish a probability of a 
correct response occurring equal to that 
established by the fastest group after one 
reinforcement. The probabilities for the 


TABLE 5 
PROBABILITIES OF Correct RESPONSES 
FoLLOWING VARYING NuMnaERS or RE- 
INFORCEMENTS FOR Ss REPRESENTING 
Four LEVELS OF ACQUISI- 


TION RATE 

Slow  |Mod. Slow|Mod. Fast| Fast 
Reinforce- 
ment 

T^ |Prob. | T |Prob. | T |Prob. | T |Prob. 

1 475| .549 | 329| .647 | 489] .755 | 489| .890 

2 472| .674 | 310| .788 | 465| .871 | 385| .945 

3 455| .796 | 294| .881 | 416) .913 | 247| .980 

4 440| .854 | 270| .926 | 344| .939 | 138] .086 

5 414| .894 | 294) .940 | 265) .988 | 42| .976 

6 394| .898 | 221] .955 | 189) .995 | 22) 1.000 
rr 
Consecu- 
tive Rein- 
forcement 
opi 

2 467| .800 | 288| -872 | 439| .032 | 362] .004 

3 405| .894 | 259| .950 | 370| .938 | 320) .983 

4 376| .915 | 249] .960 | 296] .942 | 125) .984 

a T—total number of possibilities. i.e., for the slow 


group there were 475 instances in which these Ss had a 


chance to make a correct, respoi 
forcement; 472 instances in whic 
following two reinforcements, etc. 


ponse following one rein- 
h there was a chance 


JAMES B. STROUD AND LOWELL SCHOER 


moderately slow group after three rein- 
forcements and for the moderately fast 
after two, approach the probabilities of 
the fast group after one reinforcement. 
For all four groups two consecutive re- 
inforcements seem to result in about the 
same probabilities as three reinforcements 
without regard to consecutiveness. 

In the normal course of learning, there 
will be some number of instances in which 
the slowest Ss cannot for given times es- 
tablish a 5 to 1, 4 to 1, ete. ratio of re- 
inforcements in a relationship to the fast- 
est group of Ss because of the termination 
of the experiment. It is possible, though 
not highly probable, that there will be 
some instances in which, for the slowest 
learners, items will be reinforced but once. 
There should be a greater number im 
which but two or but three reinforcements 
oceur. Logically, these should be poorly 
retained, relatively. Indeed this condition 
might be sufficient to account for any 
small difference in recall between fast 
and slow learners. Among our slowest Ss 
we found 24 instances in which there had 
occurred but 1, 2, or 3 reinforcements. The 
24-hour recall scores of such items by the 
Ss involved was 37.5%, as compared with 
70% of all the items in the two lists by 
all Ss in the slowest group. The items 
volved in this treatment were pretty We 
distributed over the entire lists. Obviously 
little can be concluded from such meager 
data. r 

It is suggested that when Ss learning 
at different rates develop equal habit 
strength they should not differ, by acd" 
sition rates, in retention, unless the 1” 
appropriate method of relearning i$ US? 
as the measure of retention. However; the 
usual method of requiring all Ss to lear? 
to a trials criterion does not quite qua" 
antee equal habit strength. In such cases 
the rapid learners may show a slight an 
vantage in retention. Conceivably, r^ 
might vary with the rigorousness © 
eriterion to which the Ss were require 
learn. 


INDIVIDUAL DIFFERENCES IN MEMORY 291 


Incidentally, if rapid learners behave on 
long lists as slow learners do on short 
lists, as Carter’s work suggests, then we 
might infer that Ss at any given acquisi- 
tion level should retain lists of varying 
lengths equally well when learned to a 
common habit strength criterion. 

This leaves open one of the major ques- 
tions raised above: do individuals differ 
among themselves in retention of mate- 
rial originally learned to the same level 
by a habit strength criterion? Our Ss all 
learned to a criterion of one perfect trial, 
but not to the same habit strength cri- 
terion. The six intercorrelations among 
the respective recall scores varied from 
23 to 41, all significant at the .01 level. 
However, in the case of the PN, and PN, 
lists, significant correlations were obtained 
between learning and recall. 

As an attempt to partial out the effects 
attributable to differences in rate of ac- 
quisition, r’s were computed between recall 
Scores on these two tasks by levels of 
learning ability. In doing this, the learning 
Scores on the two tasks, PN, and PN;, 
Were averaged and eight levels of learning 
rate established. The average by z trans- 
formation was .15. The results of the vari- 
ous analyses performed leave this issue in 
doubt. 

Finally, for the data at hand, we sought 
to determine practice effects on learning 
and on recall. The means of the scores by 
orders in which the performances occurred 
are presented in Table 6. As seen, practice 
affected rate of acquisition, as was to be 
expected, but did not affect recall to any 
appreciable degree. 


SuMMARY 


This paper has recounted some of the 
Salient features of an investigation of in- 
dividual differences in memory. One hun- 
dred forty-nine Ss, in individual sessions, 
learned and recalled and relearned four 
lists each, two of paired adjectives and two 
of paired picture-names. Significant cor- 
Telations, ranging from .23 to 41, were 


TABLE 6 
PRACTICE EFFECTS UPON LEARNING 
AND RECALL 


Mean Trials to Learn® Mean Recall 


PA: | PA: | PNi| PN: | PA: | PA: |PNi| PN: 


1 |13.28| 12.31| 14.79] 17.17] 8.93 | 9.27 |6.29| 6.66 
2 |10.69| 11.15| 12.51] 13.05] 8.38 | 8.15 |5.78) 5.14 
3 |10.38| 9.39] 12.18] 13.81| 8.14 | 9.46 |0.83| 6.12 
4 | 10.56) 9.40| 11.19) 12.38| 7.56 | 7.33 |5.31| 4.65 


® Does not include first presentation trial. 


obtained among the various recall scores. 
On the two paired adjectives lists, 7’s be- 
tween trials to learn and words recalled 
(after 24 hours) were not significant. Sig- 
nificant r's of —.23 and —.25 were ob- 
tained between these variables on the two 
picture-names lists. Ss were divided into 
fifths upon the basis of trials to learn on 
each of the four lists. No significant differ- 
ences were obtained among the recall 
scores of the various fifths on any of the 
lists. At best, the results of the various 
analyses suggest no more than a slight 
relationship between rate of learning and 
recall. 

The data have been analyzed with re- 
spect to differences in habit strength 
among Ss of varying rates of learning, all 
of whom attained a common trials-to-learn 
criterion. It was found that the mean 
number of reinforcements bore a fairly 
constant ratio to the number of item 
presentations (or number of trials) for 
the different levels of learning ability. The 
mean number of reinforcements accom- 
plished by the slowest fifth of Ss was 3.6 
times as great as that by the fastest fifth. 
Further analysis of specific items showed 
that the slowest fifth required five rein- 
forcements to establish a probability of 
making a correct response on the succeed- 
ing trial equal to that established by the 
fastest fifth after one reinforcement. The 
important question of whether or not Ss 
who have learned to a common habit 
strength criterion differ significantly 


292 


among themselves in retention is unan- 
swered. The data are in accord with the 
proposition that significant differences in 
retention do exist among Ss who have 
achieved a common trials-to-learn crite- 
rion. 

Practice effects reduced the trials re- 
quired to learn, but did not result in in- 
creased recall scores. 


REFERENCES 


Carrer, L. J. Interrelationships among mem- 
ory, rate of acquisition, and length of 


JAMES B. STROUD AND LOWELL SCHOER 


task. Unpublished doctoral dissertation, 
State Univer. of Iowa, 1958. 

Haacen, C. H. Learning and retention as & 
function of the synonymity of original 
and interpolated tasks. Unpublished 
doctoral dissertation, State Univer. of 
Iowa, 1943. 

McGeocu, J. A., & Irion, A. L. The psychol- 
ogy of learning. New York: Longmans, 
Green, 1952. P. 325. 

Unverwoop, B. J. Speed of learning and 
amount retained: A consideration © 
methodology. Psychol. Bull, 1954, 51, 
276-282. 


(Received June 8, 1959) 


JOURNAL or EDUCATIONAL PSYCHOLOGY 


Vol. 50, No. 6, 1959 


A NOTE ON THE EFFECT OF FILLING OUT AN 
“ANXIETY SCALE” ON EXAMINATION 
PERFORMANCE 


NEIL A. CARRIER 
Southern Illinois University 


Certain statistically significant relation- 
ships were found in an earlier study by 
the writer (Carrier, 1957) designed to de- 
termine whether certain personality meas- 
ures are related to course examination 
performance under stress conditions. But 
the conclusions appeared tenuous because 
mean score differences between high and 
low stress experimental groups were gen- 
erally small. Post hoc speculation sug- 
gested several tenable explanations. One 
of these involved the old problem of the 
extent to which the investigator’s meas- 
uring operations themselves may un- 
wantedly influence that which is measured. 

As a part of the study procedures, & 
brief “scale” had been administered to 
assess the subjects’ (Ss’) awareness of 
anxiety or tenseness both before and dur- 
ing the examination. It is possible that in 
attempting to measure the Ss’ awareness 
of anxiety, the measuring instrument it- 
self had increased the anxiety responses of 
the Ss through some kind of “feedback” 
Or suggestion process. Further, if some 
differential effect were present whereby 
the low stress Ss increased more in anxiety 
than did the already highly anxious high 
Stress Ss, the result would have been to 
Teduce the desired difference in stressful- 
Ness between the high and low stress 
groups. As a consequence (providing stress 
does affect examination performance), 
there would have been less difference in 
examination performance between the two 
groups than would otherwise have been 
found. 

A study by Coppock (1955) added fur- 
ther eredance to this possibility. Drawing 
Support from other sources (Horney, 


1937; Mowrer, 1950) he proposed that an 
awareness of one’s emotional reactions 
might thereby increase the amplitude or 
duration of such reactions. His study’s 
results gave some support to the hypothe- 
sis that “suggestion and stress can influ- 
ence the extent of an individual’s over- 
reaction to information about his own 
autonomie responses" (Coppock, 1955, p. 
28). Since the present author's earlier 
study attempted to get self-reports of au- 
tonomic responses, one effect of the meas- 
ure designed to get these reports may have 
been to increase emotional responses where 
it was especially not wanted, ie. in the 
low stress group. 

Consequently, a partial “replication” 
was planned to determine whether filling 
out anxiety scales does in fact depress 
examination performance. 


PROCEDURE 


The Ss were 1074 students in five lec- 
ture sections of the introductory psychol- 
ogy course at the University of Colorado 
during the fall 1956 semester. Two forms 
of a 75 item, multiple-choice examination 
were administered. Both forms contained 
the same 75 questions, but the order of 
questions was different in the two forms. 
The two forms were alternated in distri- 
bution to the students. 

One form was arbitrarily designated as 
the “anxiety” form, the other was called 
the “neutral” form. Two pages, each con- 
taining seven graphic rating scales, were 
included with the anxiety form. The first 
page appeared at the beginning of the 
exam. It requested the S to “place a large 
X at some place along each line which will 


293 


294 


best indicate the strength of your feeling 
on each question.” The first question 
asked, ^How 'Tense or Anxious about this 
exam do you feel at this moment?" At 
one end of a six-inch line was the phrase 
“very tense," at the other end appeared 
“very relaxed"; “midpoint” appeared in 
the middle. The remaining questions and 
scales asked how Well Prepared the stu- 
dent felt for the exam, to what extent he 
was aware of an “Uneasy Feeling,” how 
Confident he felt he would do well on the 
exam, the extent to which he noticed an 
Accelerated Heartbeat at the moment, 
whether he was Perspiring more than 
usual at the moment, and to what extent 
he felt Worried about the exam. 
Approximately in the middle of the 
exam the second page of seven graphic 
rating scales appeared. Printed instruc- 
tions were to “indicate your feelings at 
this point in the examination. Do not refer 
to your answers to the scales at the be- 
ginning of the exam.” The questions and 
scales here were the same as those pre- 
sented at the beginning of the examina- 
tion. 
The second, or neutral, form also con- 
sisted of two pages. The first page ap- 
peared at the beginning of the exam. It 
requested the following information: 
name, sex, age, year in school, local ad- 
dress, and home state. The second page 
appeared approximately in the middle of 
the exam. It asked for “the following in- 
formation before continuing this exam: 
name, city of birth, state of birth, mother 
living, father living, number of brothers, 
and number of sisters.” The neutral form 
sheets obviously were intended to occupy 
an amount of time and attention of the 
control group of Ss comparable to that 
demanded from the experimental group of 
Ss, without, however, confronting the 
former with anxiety-producing materials. 


NEIL A. CARRIER 


RESULTS AND DISCUSSIONS 


The mean examination score for the 553 
students who filled out the anxiety scales 
was 53.33; the mean score for the 521 
students who filled out the personal data 
sheets was 53.36. The standard deviation 
of the former group was 8.89; that for the 
latter group was 8.95. Both distributions 
were normal in shape by visual inspection. 

These remarkably similar results give 
no support to the contention that filling 
out anxiety scales during an examination 
depresses performance by “reminding” the 
student of his emotional state. Whether 
there was an increase in emotional reac- 
tions as a result of the self-rating proce- 
dures was, of course, not directly assessed. 
But no effect on examination performance 
was detected. It may be added that had 
a direct influence been found, research o 
this kind would be more difficult than it 
already is. , 

It appears, then, that one interpretation 
of the small score differences between big? 
and low stress groups found in the earlier 
study has been discounted. Remaining al- 
ternative explanations (among others; 
point to a failure to get a large “spread 
on stress between the two experimental 
groups (probably, to lower the stress 
the low stress group), or to examinatioD 
stress having less effect upon performance 
than is popularly claimed. 


REFERENCES 4 
ertain 


Carrier, N. A. The relationship of © tion 
personality measures to examine y- 
performance under stress. J. educ. 
chol., 1957, 48, 510-520. : 

CoPPocx, H. W. Responses of subjects 
their own galvanic skin responses. 
abnorm. soc. Psychol., 1955, 50, 25- ‘of 

Horney, Karen. The neurotic personality 
our time. New York: Norton, 199 


to 
J. 


= 

Mownzn, O. H. Learning theory and Eo 
sonality dynamics. New York: d 
1950. 


(Received June 24, 1959) 


] 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 6, 1959 


PSYCHOLOGICAL HEALTH AND CLASSROOM 


FUNCTIONING: 
A STUDY OF DISSATISFACTION WITH SCHOOL AMONG 
ADOLESCENTS! 


PHILIP W. JACKSON ax» JACOB W. GETZELS 
University of Chicago 


The problem of dissatisfaction with 
school among children is of theoretical and 
practical significance to both psychologists 
and educators. At the theoretical level dis- 
satisfaction with school becomes part of 
a broader area of inquiry which aims at an 
understanding of the individual’s function- 
ing in an institutional setting and which 
includes studies of staff morale, role con- 
flict, productivity, and the like. At a prac- 
tical level the question of why children like 
or dislike school is directly related to the 
immediate problems of school dropouts, 
grouping procedures, planning for the 
gifted child, and the like. 

As might be expected, a social phe- 
nomenon as important as dissatisfaction 
with school is not without its explanatory 
hypothesis. Some of these spring from 
empirical findings, while others appear to 
be part of our cultural ethos. Educational 
studies that point to an empirical linkage 
between school failure and school drop- 
outs, and industrial studies that demon- 
strate a relationship between low morale 
and decreased output, lead one to suspect 
that reduced effectiveness in school (i.e., 
low scholastic achievement) would be a 
natural concomitant of dissatisfaction with 
the institution. Thus one would expect to 
find heightened dissatisfaction among stu- 
dents who have low ability or who are un- 
able for one reason or another to deal ade- 
quately with scholastic material. 


2 This study was supported by a research 
Brant from the United States Office of Edu- 
sation, The present report is an expanded 
Fees of a paper read at the American Psy- 
logical Association meeting, Cincinnati, 

hio, September 1959. 


More recently it has been suggested (al- 
though never adequately demonstrated) 
that many successful students with high 
ability are dissatisfied with their school 
experiences; the term “boredom” is often 
linked with the term "gifted child" in cur- 
rent expositions by educators. 'The bore- 
dom problem among “gifted” combined 
with the failure experiences of the low 
ability child suggests that the greatest 
number of dissatisfied students is to be 
found among extreme ability groups. Those 
who are low in ability and achievement 
would be expected to show dissatisfaction 
because of the numerous frustrations they 
experience in the classroom. Those who are 
high in ability and achievement would be 
expected to show dissatisfaction because of 
the relative lack of stimulation which they 
experience in the classroom. 

Both of these explanations (or, more 
accurately, hypotheses) contain the im- 
plication that dissatisfaction with an in- 
stitution arises out of the individual’s 
interaction with that institution. An al- 
ternative explanation might be that the 
individual brings a set toward satisfaction 
or dissatisfaction to the institution—that 
it is a reflection of a more pervasive per- 
sonal orientation and that success or fail- 
ure experiences within the institution have 
a limited influence upon it. This hypothe- 
sis obviously places more emphasis than 
do the earlier ones upon psychological var- 
jables, as opposed to environmental vari- 
ables, in understanding dissatisfaction with 
school. The research described here was 
designed to test the relative merit of these 


alternative views. 


295 


296 


PROBLEM 


The purpose of this investigation is to 
examine the differences in psychological 
functioning and classroom effectiveness be- 
tween two groups of adolescents—those 
who are satisfied with their recent school 
experiences and those who are dissatisfied. 


SUBJECTS AND PROCEDURE 


The Ss of this investigation were two 
groups of adolescents identified from 
among 531 students enrolled in a Mid- 
western private school. These students 
were divided into five class groups ranging 
from the prefreshmen to the senior year 
of high school. In this institution a single 
grade, the prefreshmen, is substituted for 
the usual seventh and eighth grades. The 
instrument used to select the experimental 
groups, called the Student Opinion Poll, 
was a 60-item opinionnaire designed to 
elicit responses concerning general satis- 
faction or dissatisfaction with various as- 
pects of school—viz., the teachers, the 
curriculum, the student body, and class- 
room procedures. The following are sample 
items, one in each of the four areas. 


3. While there are some differences among 
them, most teachers in this school are: 
a. Very inspiring 
b. Quite inspiring 
c. Somewhat inspiring 
d. Not inspiring 
16. Most of the subjects taught in the 
school are: 
a. Interesting and challenging 
b. Somewhat above average in interest 
c. Somewhat below average in interest 
d. Dull and routine 
14. From the standpoint of intellectual 
ability, students in this school are: 
a. Too bright—it is difficult to keep up 
with them 
b. Just bright enough 
c. Not bright enough—they do not pro- 
vide enough intellectual stimulation 
5. The freedom to contribute something 
in class without being called upon by the 
i more than it should be 
"students do not *» enough op- 
rtunity to have their say 
b. [ones e more than it should be 


PHILIP W. JACKSON AND JACOB W. GETZELS 


—students seem to be rewarded just 
for speaking even when they have lit- 
tle to say 

c. Handled about right 


The instrument was scored by giving one 
point each time the S chose the “most 
satisfied” response to a multiple-choice 
item. Thus, the possible range of scores 
was from 0 to 60. For the total school 
population the mean score on the Student 
Opinion Poll was 37.30; the standard de- 
viation was 9.57. The experimental groups 
were chosen as follows: 


Group I—the “dissatisfied” group—con- 
sisted of all students whose score on the 
opinionnaire was at least one and a ha 
standard deviations below the mean of, be 
entire student body. This group containe 
27 boys and 20 girls. 

Group II—the “satisfied” 
Sisted of all students whose score on 
opinionnaire was at least one and & b 
standard deviations above the mean of the 
entire student, body. This group containe 
25 boys and 20 girls. 


The experimental groups were compared 
on the following variables: 


roup—con= 
g the 


a 


1. Individual intelligence tests. In mn 
cases this was the Binet, A small numba d 
children were given the Henmon-Nelson, jon 
scores of which were converted by regress! 
equation into equivalent Binet scores. 

2. Standardized verbal achievement ed. 
The Cooperative Reading Test was oe 
Prefreshmen and freshmen were given A 
Cı, Form Y; older students were given 
Form T. r nent 

3. Standardized numerical achieve” S 
tests. Because of differences in the QURE 
of the various grade groups it was not cal 
ble to administer the same test of nu un 
achievement to all Ss. The followiné A 
were given according to grade placeme rith- 

Prefreshman—Iowa Everypupil 


test. 


metic Test, Advanced Form O. at- 
Freshmen—Snader General Mathem 
ies Test. tary 
Sophomores—Cooperative Elemen 
Algebra Test, Form T. iate AF 
Juniors—Cooperative Intermedia 
gebra Test. Tests 
Seniors—Cooperative Geometry 
Form 2. nwo forms 


4. California Personality Test. 


c intermea” 
of this instrument were used. The inte 


DISSATISFACTION WITH SCHOOL 297 


ate form was given to prefreshmen; the sec- 
ondary form was given to all of the older 
groups. Two subscores were obtained, “per- 
sonal adjustment” and “social adjustment.” 

5. Direct Sentence Completion Test. Ss 
were asked to complete 27 sentences of the 
type: “When I saw I was going to fail I 
svar Bea ," or, “I think my father is .........” 
Each sentence was given a plus or minus 
score depending upon the presence or ab- 
sence of morbid fantasy, defeatism, overt 
aggression, and the like. The total score was 
the summation of the individual sentence 
scores. 

6. Indirect Sentence Completion Test. 
This instrument was identical with the Di- 
rect Sentence Completion Test except that 
proper names were inserted for the pronoun 
“I,” thus changing it from a “self-report” to 
a “projective” instrument. Boys’ names were 
used in the male form of the instrument and 
girls’ names in the female form. The instru- 
ment was presented as a “thinking speed” 
test. To reinforce this notion Ss were asked 
to raise their hands when they were finished 
and the elapsed time was written on their 
test booklet. This instrument was adminis- 
tered approximately two weeks prior to the 
administration of the Direct Sentence Com- 
pletion Test. 

7. Group Rorschach. Cards III, IV, IX, 
and X were projected on a screen. For each 
picture the S was presented with 10 re- 
sponses and was asked to choose the three 
which he thought to be most appropriate. 
Each list of 10 contained four “pathological” 
responses. The S’s score was the number of 
nonpathologic responses among his 12 
choices, This group technique follows that 
described by Harrower-Erikson and Steiner 
(1945). 

8. Teacher ratings. Each student was 
Biven three ratings by his present teachers. 
These ratings included: (a) his general de- 
Sirability as a student; (b) his ability to be- 
come involved in learning activities; and (c) 
his possession of leadership qualities. Teach- 
ers were required to place all of their stu- 
dents on a five-point scale so that Categories 
l and 5 each contained one-twelfth of the 
Students; Categories 2 and 4 each contained 
one-fourth of the students; and Category 
3 contained one-third of the students. The 
values 5, 8, 10, 12, and 15 were assigned to 
the eategories and were used in quantifying 
he ratings. 

9. Adjective Check List. From a list of 

adjectives each student was asked to 
choose the 6 which best described his char- 
acteristic feelings while attending classes in 


particular school subjects. The list contained 
12 “positive” (e.g., confident, happy, eager, 
relaxed) and 12 “negative” adjectives (e.g. 
bored, restless, misunderstood, angry). The 
use of the negative adjectives by the experi- 
mental groups was analyzed both quanti- 
tatively and qualitatively. 


RzesuLTS 


With the exception of the adjective 
check list the results of all comparisons 
are shown in Table 1. Contrary to popular 
expectations the “satisfied” and “dissatis- 
fied" students did not differ from each 
other in either general intellectual ability 
or in scholastic achievement. Those differ- 
ences which did appear were linked to psy- 
chological rather than scholastie variables. 
More specifically, each of the test instru- 
ments designed to assess psychological 
health or "adjustment" was effective in 
distinguishing “satisfied” from “dissatis- 
fied" students within one or both sex 
groups. 

For both sexes the experimental groups 
were differentiated by their scores on the 
California Test of Personality. The ex- 
perimental groups of boys were further 
differentiated by their responses to the In- 
direct Sentenee Completion Test. For girls 
additional differences appeared in their re- 
sponses to the Direct Sentence Completion 
Test and the Group Rorschach. 

On all of these test variables the “satis- 
fied” group attained the “better” score— 
le. the score signifying a more adequate 
level of psychological functioning. It is 
also worthy of note that whenever a sig- 
nificant difference appeared, the mean 
score of the total student population fell 
between the mean scores of the experi- 
mental groups. Thus, the variables that 
differentiate the experimental groups tend 
also to distinguish them from the total 
population of students. 

In addition to showing differences on 
psychological health variables, “satisfied” 
and “dissatisfied” boys were perceived dif- 
ferently by their teachers. On all three of 
the teachers’ ratings the “satisfied” boys 


298 


PHILIP W. JACKSON AND JACOB W. GETZELS 


TABLE 1 


MEAN SCORES, STANDARD DEVIATIONS, AND £ STATISTICS FOR SATISFIED AND DISSATISFIED 
ADOLESCENTS ON DEPENDENT VARIABLES^ 


Boys 
Dissatisfied 


Satisfied Dissatisfied Satisfied 
(N = 27) (N = 25) (N = 20) (N = 20) ; 
z s z s z s E s 
134.85 14.58 136.44 14.59 ns 128.45 15.06 128.00 11.45 ns 
Verbal Achievement 49.96 8.69 50.68 7.87 ns 50.63 9.11 52.28 0.76 ns 
Numerical Achievement 50.35 9.75 52.17 10.52 ns 47.78 8.61 48.50 10.20 ns 
Calif. Personal Adjust. 45.58 9.82 53.40 7.03  3.18** 47.90 13.03 54.70 9.25 1.86" 
Calif. Social Adjust. 44.85 11.37 51.84 8.93  2.49** 47.00 13.15 55.70 7.89 2.50°* 
Direct Sentence Comp. 40.93 10.58 49.25 10.02 ns 46.65 12.00 54.00 5.73  2.09'* 
Indirect Sentence Comp. 47.19 9.61 51.20 6.95  1.75* 49.00 10.35 53.47 7.97 ns 
Group Rorschach 48.35 10.60 47.44 10.30 ns 47.35 11.35 54.16 8.32 2.15" 
Teacher Rating I: 
Desirability as a student 8.04 1.83 10.35 1.70  2.85** 9.84 1.91 10.05 1.59 ns 
Teacher Rating II: 
Leadership qualities 9.00 2.08 10.13 1.96  2.00* 9.00 2.37 10.00 1.21 ns 
Teacher Rating III: 
Involvement in learning 9.00 2.14 10.23 1.69  2.14** 9.07 2.32 10.33 2.11 ns 


* Significant at the .05 level. 
** Significant at the .01 level. 


^ With theexception of IQ, all scores were based upon parameteraof the total student body from which the experi- 
mental groups were drawn. The scores of all tests were transformed to T scores with a mean of 50 and a standard devia- 
tion of 10. For the total population the teacher ratings have a mean of 10 and a standard deviation of 2. The mean IQs 


for the total school population are: boys, 132, and girls, 128. 


TABLE 2 
NUMBER OF SUBJECTS CHOOSING NEGATIVE ADJECTIVES WHEN ASKED TO DESCRIBP 
TYPICAL CLASSROOM FEELINGS 


Boys Girls 
Adjective Dissatisfied Satisfied Chi Chi 
al er " " " 
(5-2) (N = 25) Bout Dissatisfied Banin Square 
Inadequate 19 16 ns 17 1 10.42*7 
Ignorant 19 13 ns 15 3 14.54" 
Dull "4 16 6.36* 16 9 n 
Bored 13 8.61** 20 13 ki 
Restless 20 15 ns 19 9 11.90** 
Uncertain 20 21 ns 17 13 n8 | 
Angry 15 4 8.76** 13 4 8.29 
Unnoticed 19 5 18.25** g 4 5. 
Unhelped 18 8 6.24* 9 6 n 
Misunderstood 16 5 8:31*° 5 2 ns 
Rejected 12 3 6.66** 4 0 e 
Restrained 17 2 16.91** 9 3 4. 
* Significant at the .05 level. 
** Significant at the .01 level. 

s : ` ; irs 
received more favorable judgments than negative feelings publicly than are Pon 
did “dissatisfied” boys. The fact that this This hypothesis receives some cO 3 ok list 

ult does not appear to be true for girls from the results of the adjective € E 
"a support to the popular expectation which are described below. sol s 
ien In Table 2 are shown the numbe 


that boys are more likely to express their 


e — 


DISSATISFACTION WITH SCHOOL 299 


who chose negative adjectives when asked 
to describe their typical classroom feelings. 
As they are arranged in Table 2 the ad- 
jectives reflect the rankings of four judges 
who were asked to rank the words on the 
degree to which they involved an implicit 
or explicit criticism of others. The 12 ad- 
jectives were typed on separate cards and 
were accompanied by the following direc- 
tions: 


On the following cards are a number of 
negative adjectives which a person might 
use to describe himself. Rank these adjec- 
tives on the degree to which they involve 
an implicit or explicit criticism of others. 
For each adjective ask the question: If a 
person used this adjective to describe him- 
self would he also be implicitly or explicitly 
criticizing others? Give a rank of 1 to the 
adjective which would be least critical of 
others and a rank of 12 to the adjective 
which would be most critical of others. 


Four psychologists served as judges. The 
average rank order correlation among the 
four sets of judgments was .84. The adjec- 
tives are presented in Table 2 according to 
the ranked sum-of-ranks of the judges. 
The adjective “inadequate” was judged as 
being most free of criticism of others, 
while the adjective “restrained” was 
judged as involving the greatest amount 
of criticism of others. 

As might be expected, the use of nega- 
tive adjectives was far more frequent 
among dissatisfied students than among 
satisfied students. Four adjectives seemed 
to discriminate equally well between the 
experimental groups for both sexes; these 
were: “bored,” “angry,” “restrained,” and 
“dull.” 

An examination of Table 2 also suggests 
the existence of sex differences in the stu- 
dents’ description of their typical class- 
Toom feelings. Remembering the classi- 
ficatory scheme by which the adjectives 
are ranked in Table 2, it appears that dis- 
Satisfied girls are somewhat less likely 
than dissatisfied boys to use negative ad- 
Jectives involving implicit criticism of 
others. Dissatisfied boys, on the other 


hand, are less likely than dissatisfied girls 
to be distinguished from their satisfied 
counterparts by the use of adjectives not 
involving implicit criticism of others. If 
one thinks of criticism directed towards 
others within Rosenzweig's schema of “in- 
tropunitiveness” and “extrapunitiveness” 
(Murray, 1945), then the observed sex 
differences may be conceptualized by say- 
ing that dissatisfied girls are more intro- 
punitive than satisfied girls; dissatisfied 
boys are more eztrapunitive than satisfied 
boys. 

"This difference in the direction of ag- 
gression may provide a context for the 
obtained differences in teacher ratings dis- 
cussed earlier. If the dissatisfied boy is 
more likely than his female counterpart to 
lay the blame for his dissatisfaetion upon 
others in his environment, particularly 
school authorities, it is reasonable to expect 
that he would be viewed as somewhat less 
than completely desirable by the class- 
room teacher. The dissatisfied girl, on the 
other hand, seems more willing to direct 
her negative feelings inward, thus avoiding 
the additional risk of counter-aggression 
by school authorities or by other adults. 


Discussion 


Two major conclusions are suggested by 
the findings of this study. First, dissatisfac- 
tion with school appears to be part of a 
larger picture of psychological discontent 
rather than a direct reflection of inefficient 
functioning in the classroom. It is almost 
as if dissatisfaction were a product of a 
pervasive perceptual set that colors the 
student’s view of himself and his world. 
Second, it appears that the “dynamics” of 
dissatisfaction operate differently for boys 
and girls. Boys seem to project the causes 
of their discontent upon the world around 
them so that adults are seen as rejecting 
and lacking in understanding. This tend- 
ency to blame adults may be one reason 
why these boys are seen as less attractive 
by teachers than are satisfied boys. Girls, 
on the other hand, are more likely to be 


300 PHILIP W. JACKSON AND JACOB W. GETZELS 


self-critical, turning blame for their dis- 
satisfaction inward. Feelings of Ae 
i nce, and restlessness more s arply 
differed satisfied and dissatisfied girls 
than is the case with boys. This tendency 
to be intropunitive may partially explain 
why teacher ratings fail to distinguish be- 
tween our two experimental groups of girls. 

The atypicality of the sample popula- 
tion used in this research places a number 
of limitations upon the inferential state- 
ments which can be made on the basis of 
these findings. Fortunately, however, the 
major portion of the investigation has re- 
cently been replicated using seventh and 
eighth grade lower-class Negro adolescents 
as Ss (Spillman, 1959). The findings of the 
latter study are essentially the same as 
those reported here. Again the psychologi- 
cal rather than the intellectual or scho- 
lastic variables discriminated between 
satisfied and dissatisfied students, The find- 
ings with respect to the use of negative 
adjectives were not as clear-cut but, again, 
every intropunitive adjective was used 
more frequently by dissatisfied girls as 
compared with dissatisfied boys, while the 
latter exceeded the girls in their use of 
extrapunitive adjectives, 

It should be noted that even the most 
satisfied students made some use of nega- 
tive adjectives when asked to describe their 
typical feelings in the classroom. Also, the 
average member of the satisfied group 
expressed some dissatisfaction on one-sixth 
of the questions in the Student Opinion 
Poll. These two observations should serve 
as ample cautions against the danger of 
interpreting any sign of dissatisfaction 
with school as symptomatic of deeper psy- 


chological difficulties. Apparently, some de- 
gree of dissatisfaction is the rule rather 
than the exception. Nonetheless, the re- 
sponses of the extremely disgruntled group 
of students leaves little doubt that dis- 
satisfaction with school, like beauty, is 
frequently in the eye of the beholder. 


SUMMARY 


This investigation examines the differ- 
ences in psychological functioning and 
classroom effectiveness between two groups 
of adolescents—those who are satisfied 
with their recent school experiences and 
those who are dissatisfied. The major find- 
ings point to: (a) the relevance of psy- 
chological health data rather than scho- 
lastie achievement data in understanding 
dissatisfaction with school; (b) the im- 
portance of differentiating the attitudes of 
dissatisfied girls from those of dissatisfied 
boys, the former being characterized by 
feelings of personal inadequacy, the latter 
by feelings critical of school authorities. 
Rosenzweig’s concepts of intropunitiveness 
and extrapunitiveness are applied to these 
findings and a relevant theoretical frame- 
work is proposed. 


REFERENCES 
Hannowzn-Enrksox, M. R. & STEINER, M. 


E. Large scale Rorschach techniques- 
Springfield, Ill: Charles C Thomas, 
1945. 


Murray, H. A. Ezplorations in personality: 
New York: Oxford Univer. Press, 1938 

Spmuman, R. J. Psychological and scholastic 
correlates of dissatisfaction with schoo 
among adolescents. Unpublished mas- 
ter’s thesis. Univer. of Chicago, 1959- 


(Received July 6, 1959) 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 50, No. 6, 1959 


NOTE TAKING DURING OR AFTER THE LECTURE 


SIGMUND EISNER ax» KERMIT ROHDE 
Oregon State College 


Though note taking during lecture is 
strongly advocated by most how-to-study 
books, one is hard put to find a theory 
which would advocate the note-taking 
process as a means of learning in and of 
itself. The resulting notes themselves, of 
course, probably are in most cases better 
for study, and hence retention, than no 
notes at all. But the question remains 
whether the student cannot use his time 
during the lecture for some more beneficial 
activity. If the student takes notes after 
the lecture, he is freed during lecture for 
such activity and induced to perform such 
activity in order that he might make notes, 
—and still he has notes available for later 
study. Furthermore such a procedure 
forces an immediate attempt to recall. 
Spitzer (1939) found, working with prose 
passages, that an immediate attempt to 
recall improved retention, and Gates 
(1917), as a result of his retention studies, 
suggested that taking notes after lecture 
might be more conducive to retention than 
taking them during lecture. 


Problem 


The problem was to test the hypothesis 
that taking notes during a lecture is a less 
effective method of retaining lecture mate- 
rial than listening to the lecture and tak- 
ing notes immediately afterward. 


Subjects 

A class of 60 students in a college Eng- 
lish Literature course at the freshman level 
was used as subjects. 


PROCEDURE 


On the basis of the instructor’s grades 
and impression during the previous three 
weeks, members of the class were paired 
for ability. The class was then split into 


two groups, the only stipulation being that 
each group contain one member of each 
pair. 

Two 30-min. lectures were given to the 
class: one a lecture on the short story, 
and one about four weeks later on roman- 
ticism. During the first lecture, Group I 
was told to take notes during the lecture 
and then to study them for the 15 min. 
immediately following, and Group II was 
told to take no notes during the lecture, 
but to listen and be prepared to take notes 
and study them immediately after the lec- 
ture. To prevent the complicating effects 
which might arise from varied amounts of 
additional study, the notes were collected 
after the 15-min. period following the lec- 
ture. This same procedure was followed 
during the second lecture except that 
Group I took notes after and Group II 
during the lecture. 

The specific instructions were given as 
follows: (The material in brackets was de- 
leted at the second lecture period and that 
in parentheses was deleted at the first lec- 
ture period.) 


Today we are going to conduct (the sec- 
ond part of) an experiment on note taking 
(in order to provide a control). (You will re- 
member) What we wish to find out is 
whether it is more effective to take notes 
during lecture or immediately after lecture. 
In a few minutes, therefore, Dr. E. will di- 
vide the class into two groups, one of which 
is to take notes during the lecture as usual, 
and the other is to concentrate intensely on 
the lecture while it is being given and then 
immediately after the lecture jot down all 
the notes you can. (Each of you will be in 
the opposite group from last time.) After 
Dr. E. has divided the class, he will give the 
lecture—a 30-min. lecture on the [short 
story] (novel)—and then there will be a 
15-min. period of silence in order for you to 
jot down and study your notes, or just to 
study your notes, depending upon which 
group you are in. Then you will sign and 


301 


302 


hand in your notes, which we will study for 
the purposes of our experiment and return 
to you at a later date. Part of the next meet- 
ing will be devoted to a test covering this 
material. Though the test results will not af- 
fect your grade, please try to do your best 
on this test. Are there any questions before 
we divide the class? 


'The names in each group were then read 


and the lecture given. After the lecture, 
Dr. E. said: 


All right, those of you who have notes 
study them, and the others jot down any 
notes you wish, and study them. 


At the end of the 15 min. Dr. E. said: 


Will you please sign your notes and hand 
them in? 


Two days later, the class was given a 


50-item true-false test on the lecture and 


an essay question with the following in- 
structions: 


This is the test I mentioned at our last 
meeting. Though it will not count on your 
grade, please do your best. 


For the lecture on romanticism the essay 
question was: 


You have heard a lecture which defines 
romanticism. According to this lecture what 
are the romantic elements in Huckleberry 
Finn? (The class had just finished reading 


Huckleberry Finn.) 

For the short story lecture, the question 
was: 

If fiction writers are interested in truth, 
and if fiction is the opposite of fact, in that 


fact has occurred while fiction was invented, 
why do seekers of truth write fiction? 


Three weeks after the second lecture, 
the class was given an unannounced test 
(hereinafter referred to as the “late” test) 
containing 100 true-false items, 50 on the 
short story lecture and 50 on the romanti- 
cism lecture. 

All true-false tests were scored in terms 
of numbers of errors; and, with the names 
removed so one could not tell from which 
group they came, the essays on each ques- 


SIGMUND EISNER AND KERMIT ROHDE 


tion were ranked by the lecturer according 
to the degree they showed a grasp of his 
material. à 

Present on all of the five days during 
which this experiment was in progress 
were 24 members from Group I and 22 
members from Group II. One member of 
Group I was dropped because he skipped 
a page of the test. Another member chosen 
at random was eliminated from Group I 
in order to make it equal in size to Group 
TE. 


RESULTS 


Three tests were made with the data. 

First for each student his score on the 
true-false items covering the lecture dur- 
ing which he took notes was compared 
with his score on the items covering the 
lecture after which he took notes. The re- 
sults are given in Table 1. 

Since no significant differences appeared 
between simultaneous and delayed note 
taking on either the early or late test, » 
was decided to check on the possibility 
that a given note taking procedure woul 
favor a student at a high level of ability 
but impede a student of a low level. The 
students were divided into two sue 
groups on the basis of their end of term 
grade in the course. This procedure place 
into each of the experimental groups 
good (A and B) students and 12 poo 
students. Again no significant differences 
between simultaneous and delayed nue 
taking were found in either the early 97 
the late test within either the good or th? 
poor student group. as 

Secondly, to dati on the possibility 
that certain kinds of items were better 7” 
tained under one method than the othe? 
a difference score between the immediate 
note takers and the delayed note pe 
was computed for each item. This V^ 
done using the 44 Ss appearing for aa 
parts of the experiment except for 
first short story test. Since there A c- 
Ss in each group who had heard this '* 


ai 


| 


NOTE TAKING DURING OR AFTER THE LECTURE 


303 


TABLE 1 
Errors oN True-Fatse Test 


All Students 


Good Students Poot Students 


(N = 44) (N = 20) N = 24) 
Test 
M SD ë AM SD i M — SD i 
Early: 
Notes during lecture 11.4 4.4 9.2 4.8 13.3 3.1 
0.90 1.43 0.24 
Notes after lecture 12.0 3.7 10.9 3.7 13.0 3.5 
Late: 
Notes during lecture 14.0 4.6 11.8 3.9 15.9 4.2 
0.00 0.73 0.72 
Notes after lecture 14.0 4.1 12.5 3.2 15.3 4.3 
? Individuals were paired with themselves for the t test. 
ture and taken the early test, all 54 were Discussion 


used for this analysis. 

Of the 200 items, only 10 showed signifi- 
cant differences at the .05 level between 
immediate and delayed note takers as de- 
termined by the Chi square test. Since one 
would expect by chance exactly 10 ques- 
tions to show such differences, this line of 
investigation was concluded. It was noted, 
however, that the five questions on which 
the immediate note takers were superior 
were the more general, and the five on 
which the delayed note takers were supe- 
rior were the more detailed questions; a 
trend contrary to what one might expect. 

Finally, the 54 essay answers on the 
early short story test, with the names re- 
moved, were ranked by the lecturer ac- 
cording to the degree of understanding of 
the lecture evidenced in the essay. Next 
the 44 essay answers on the early roman- 
ticism test were similarly treated. Then 
this data was analyzed by the Mann-Whit- 
ney test to determine whether the delayed 
note takers remembered the kind of in- 
formation necessary to write such a test 
better than the immediate note takers. In 
both tests the immediate note takers 
tanked higher. But as the value for U in 
the first test was only 383 (mean U = 365, 
oU = 57.8) and in the second test only 
263 (mean U = 242, GU = 42.6), neither 
value approached significance at the 05 
level of confidence. 


When one considers that note taking 
has, with respect to most students, an ad- 
vantage of many years of practice over 
any other method, it is surprising to find 
that note taking during lecture in and of 
itself fails to show any superiority to de- 
layed note taking. Past experience was not 
without effect as a number of students in 
the present experiment were very upset 
at not being allowed to take notes during 
lecture even though grades were not in- 
volved. The question suggests itself if 
they had been similarly experienced in de- 
layed note taking, would this method have 
shown itself superior as most theory would 
suggest. 

The speculation that, at least for the 
student who has a great deal of difficulty 
concentrating on a lecture, note taking is 
an aid is not supported in this study. One 
would expect students at this level of in- 
terest to be the poorer students, but the 
results for the poorer students taken sepa- 
rately did not favor immediate note tak- 
ing. 


SUMMARY 


Students in a college English literature 
class were given two lectures. During one 
they took notes and studied afterward; 
during the other they listened and then 
took notes and studied afterward. Several 
days later they were given the “early” 


304 


test of true-false items and an essay ques- 
tion; three weeks later the "late" test of 
true-false items only. The data on 44 stu- 
dents gave no significant differences be- 
tween the two methods on either the early 
or the late true-false test (either for the 
total group or for good and poor students 
taken separately). Nor did the comparison 
of the essay tests from the "during" and 
"after" groups yield a significant differ- 
ence. Hence, no evidence was found to sup- 


SIGMUND EISNER AND KERMIT ROHDE 


port the belief that note taking itself dur- 
ing lecture is any more effective than note 
taking immediately after lecture. 


REFERENCES 


Gares, A. I. Recitation as a factor in memo- 
rizing. Arch. Psychol., 1917, 6, No. 40. 

Srrrzer, H. F. Studies in retention. J. educ. 
Psychol., 1939, 30, 641-656. 


(Received May 14, 1959) 


